I want to cut the Japanese string in C++.

Asked 2 years ago, Updated 2 years ago, 132 views

I am using C++11 to receive input of Japanese strings and create a program to cut out and count each character, but this is my first time using Japanese, so I don't have enough knowledge. If possible, I would like to write range_based_for in string.

c++ japanese

2022-09-30 19:17

2 Answers

std::string/ std::wstring/ std::u16string/ std:::u32string is provided, but each deals only with the corresponding character type ().For example, C standard libraries have case-insensitive comparisons such as stricmp, but C++ string does not even have these features.
Everything is left to the library user, so if it becomes multi-byte characters, it is also the library user's responsibility to manage UTF-16 using std::u16string, but still Salogate pair is also the user's responsibility.rg/wiki/%E 7%B 0%E 4%BD%93%E 5%AD%97%E 3%82%BB%E3%83%AC%E3%82%AF%E3%82%BF "rel="nfollow no moreferrer">Different character selector does not fit into a single character.

In the end, it is up to the user to decide what is considered a "one character", and the corresponding action will be described.


2022-09-30 19:17

The C/C++ language standard has no concept of string except literal.
There is only an array of the type that means the character code.
Also, ANSI (Shift-JIS) (also known as MBCS) and Unicode codes are handled differently.

char Nihongo_A[7] = "Japanese"; // Char tried putting Japanese in the array (for Shif-JIS code)
wchar_t Nihongo_W[4] = L "Japanese"; // Attempted to put Unicode Japanese in a wide character array (for example, UTF-16)

Both are in Japanese, but the concept of one character varies depending on the character code, so the number of elements required for the array is different as shown above.
Except for the exception, the wchar_t type means one character, so you can specify an array index to determine one character.

 if(Nihongo_W[1]==L'book'){
  // It was a book.
}

Of course, you can use Range-based for basic arrays.

 for (wchar_t&jc:Nihongo_W) {// 
  jc = L'Ah'; // Make them all 'Ah'
}

Or, there are more advanced "string classes" that have that function, so I should use them.


2022-09-30 19:17

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.