Does it conform to the standard that the compiler that supports Shift_JIS regards circular symbols as backslashes?

Asked 1 years ago, Updated 1 years ago, 44 views

Shift_JIS does not have a backslash symbol (so-called half-width) and
A circular symbol exists at the assigned code position of the backslash in ASCII.
The C/C++ compiler corresponding to the Shift_JIS encoded source code is
As far as I know, this circle is interpreted as the same as the backslash, but
Does this work according to the standard?

For example, if " nn", this should be interpreted as "circular and n", and
I wondered if interpreting it as a line feed was not correct by standard.
In Shift_JIS, where no backslash exists,
if you want to meet the specifications. Alternatively, only the sequence of trigraphs ???/ is interpreted as a backslash. Isn't it correct?

Or in ISO, JIS, and other standards, the symbol in the same code position as the ASCII backslash is
Is there a regulation that interprets it as a backslash?
I would like to hear from you if you interpret the specifications strictly, not the compiler implementation or convention.

c++ c

2022-09-30 21:18

4 Answers

The C/C++ compiler corresponding to the Shift_JIS encoded source code is
As far as I know, this circle is interpreted as the same as the backslash, but

Didn't you specify the character code for the source file?
The super-famous GCC seems to be properly interpreted as a (?) circle symbol, and compilation errors occur where the backslash is intended.

#include<stdio.h>

int main()
{
    printf("I'm scared of the table\nI'm shaking my ability\n";
    return 0;
}

CP932:

#gcc-Wall-finput-charset=cp932charset.c

# ./a.exe
be afraid of the outside
to tremble in one's ability-shiver in one's ability

Shift_JIS:

#gcc-Wall-finput-charset=shift_jis charset.c
/usr/include/sys/features.h:38:4: Error: expected identifier or '('before numeric constant
  ((_GNUC__<<16)+_GNUC_MINOR__>=(maj)<<16)+(min))
    ^
File included from /usr/include/sys/config.h:5:0 ,
                 from /usr/include/_ansi.h:16,
                 from /usr/include/stdio.h:29,
                 from charset.c:1:
(The error message follows.)

Error points:

#define_GNUC_PREREQ(maj,min)\
    ((_GNUC__<<16)+_GNUC_MINOR__>=(maj)<<16)+(min))

(Additional)
I stopped including <stdio.h> to verify that printf() works.

//#include<stdio.h>
int printf(const char*format,...);

int main()
{
    printf("I'm scared of the table\nI'm shaking my ability\n";
    printf("Are you afraid of the front???/n Is Noh shaking??/n");
    return 0;
}
#gcc-Wall-finput-charset=shift_jis-trigraphscharset.c
charset.c:In function 'main':
charset.c:7:22: Warning: Trigraph ???/ has been converted to \ [-Wtrigraphs]
  printf("\|??/n\k??/n";
 ^
charset.c:7:39: Warning: Trigraph ???/ has been converted to \ [-Wtrigraphs]

# ./a.exe 
I'm scared of the outside nn Noh shakes nn I'm scared of the outside
to tremble in one's ability-shiver in one's ability


2022-09-30 21:18

In the strict Shift_JIS (JISX0208 Appendix 1) the code position 0x20-0x7f should be in the same range as JISX0201 and the letter 0x5c should not be interpreted as U+005c because it is U+00a5.

The trick is not for "Compiler Implementation or Convention", but for "Shift_JIS encoded source".The so-called Shift_JIS encoded source most often means IANA Windows-31J (Microsoft CP932) instead of JISX0208 Appendix 1.In Windows-31J, the 7-bit code range is supposed to match ASCII, and the code position 0x5c is interpreted as U+005c.(For historical reasons, only the font has a circle symbol.)


2022-09-30 21:18

Japanese version of JIS X 3010:2003 and JIS X 3010-1993
5.2 Environmental Considerations 5.2.1 Character Set Reference contains the following wording:

The two elements 201 and ~ in the basic character set specified in this standard are
in JIS X0201 Replace with ( (circle symbol) and ( (overline) respectively.

(Note by Contributor: The original text is half-width, but full-width to avoid confusion when posting.)

Therefore, even if it is not recognized at the ISO/IEC 9899 level, JIS X 3010 recognizes it,
It's safe to think that

There was no equivalent wording in the 2.2 character set of the language specification JIS X 3014:2003.
I don't have another version, so maybe there's something else.

Based on C++03 design philosophy [C89 is strictly compatible]
Even if the language specification does not specify it, you can expect the above in C++.
I personally think so.


2022-09-30 21:18

According to C++ standards, physical source code files are first converted to logical sources using only 96 basic source character sets.And this translation is an implementation definition.

Therefore, it doesn't matter in terms of standards.


2022-09-30 21:18

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.