This is my first time asking a question.
Characters get garbled when creating a Japanese directory name creation in the Soalris 10 environment.
Does anyone know a similar phenomenon or a way to avoid it?
In the end, I plan not to use the Japanese file name, but I would like to temporarily do something about the existing file.
The garbled environment and reproduction steps are as follows:
$uname-a
SunOSosc200685.10 Generic_147440-27 sun4v sparc sun4v
$ cat/etc/release
Oracle Solaris 108/11 s10s_u10wos_17b SPARC
Copyright (c) 1983, 2011, Oracle and/orits affiliates. All rights reserved.
Assembly 23 August 2011
$ echo$LANG
ja_JP.UTF-8
$
$ item of mkdir items
$ ls
item of high quality
$
Attached is a copy of the hexadecimal dump you obtained.
$mkdira item ab item bc item c
$ ls
(a) Item a b b b c c c c
$ ls-da*
(a) Product (a)
$ ls-da* | od-cx
0000000 a345 223 201 a\n
61e59381610a
0000006
$ ls-db*
b-eye b
$ ls-db* | od-cx
0000000 b 347 233 256 b\n
62e79bae620a
0000006
$ ls-dc*
c � high-ranking c
$ ls-dc* | od-cx
0000000 c345223 207 222 233 256 c\n
63e5 9387 929bae630a00
0000011
This is an additional dump.
$ls
(a) Item a b b b c c c c
$ LANG = Cls | od-tx1
000000061 e593 81610a62 e79bae620a63 e59387
0000020 929bae630a
0000025
$
When I saved and displayed the results of history and ls in a file, the characters were not garbled.
$ls
(a) Item a b b b c c c c
$ echoa item a b b c item c>tmp.txt;cat tmp.txt
Item a. Item b. Item c. Item c.
$ history | grep mkdir
503 mkdir a item b item b item c item c
510 history | grep mkdir
$
The file was garbled as well.
$touch Items
$ ls
item of high quality
$
Storing garbled results in a file will be garbled.
This may be a file system problem.
$ls
item of high quality
$ ls>tmp.txt
$ cattmp.txt
tmp.txt
article
eye
� high school
$
First, make sure that $LANG
is the environment variable (export
in the shell), that $LC_ALL
is not configured, or that the value contains characters that are not visible.
$env|egrep'^(LANG|LC_ALL)='
$ printf'%s' "$LANG" | cat-v
$ printf'%s' "$LC_ALL" | cat-v
This may be a file system issue, so please check the mounting source and options.
$nawk-vm="`df.|sed's/.*//'`"$3==m{print}'/etc/vfstab
$ mount | grep "^`df. | sed's /.*//'`"
That's it for now.It's not clear at this time what the cause is or what the information we've been through so far will tell you.
I don't know what the cause is, but if there is a possibility that it is caused by a specific byte, such as ShiftJIS's 5c
problem, you can change the file name one byte at a time and observe which byte to change to eliminate garbled characters.
Example:
#utf-8 with only 1 byte different string
Retaining eyes b'\xe6\x93\x81\xe7\x9b\xae'
目eye b'\xe5\x94\x81\xe7\x9b\xae'
目eye b'\xe5\x93\x82\xe7\x9b\xae'
barbarism b'\xe5\x93\x81\xe8\x9b\xae'
item b'\xe5\x93\x81\xe7\x9c\xae'
item b'\xe5\x93\x81\xe7\x9b\xaf'
# 1 byte in sjis is a different string
bark b'\x96i\x96\xda'
b'\x95j\x96\xda' of Akime b'\x95j\x96\xda'
article b'\x95i\x97\xda'
Featured b'\x95i\x96\xdb'
# Only 1 byte in euc-jp is different string
habit b'\xca\xca\xcc\xdc'
b'\xc9\xcb\xcc\xdc' of Akime b'\xc9\xcb'
US>Cultivation b'\xc9\xca\xcd\xdc'
Featured b'\xc9\xca\xcc\xdd'
# Iso-2022-jp with only 1 byte different string
B'\x1b$BJL\\x1b(B')
b'\x1b$BIKL\\x1b(B')
US>Cultivation b'\x1b$BIJM\\x1b(B')
Featured b'\x1b$BIJL]\x1b(B'
Script used to output examples:
#!/usr/bin/python3
origin = "item"
for enc in "utf-8", "sjis", "euc-jp", "iso-2022-jp":
print("#",enc," only 1 byte different string")
new_byte_list = list (bytes(orig,enc))
for i in range (len(new_byte_list)) :
ifenc=="iso-2022-jp" and (i<3ori>len(new_byte_list)-4):
continue
new_byte_list[i]+=1
b=bytes(new_byte_list)
print(str(b,enc), "",b)
new_byte_list[i]-=1
© 2024 OneMinuteCode. All rights reserved.