Solaris 10 file name garbled

This is my first time asking a question.

Characters get garbled when creating a Japanese directory name creation in the Soalris 10 environment.
Does anyone know a similar phenomenon or a way to avoid it?
In the end, I plan not to use the Japanese file name, but I would like to temporarily do something about the existing file.

The garbled environment and reproduction steps are as follows:

$uname-a
SunOSosc200685.10 Generic_147440-27 sun4v sparc sun4v
$ cat/etc/release
                   Oracle Solaris 108/11 s10s_u10wos_17b SPARC
  Copyright (c) 1983, 2011, Oracle and/orits affiliates. All rights reserved.
                            Assembly 23 August 2011
$ echo$LANG
ja_JP.UTF-8 
$

$ item of mkdir items
$ ls
item of high quality
$

Attached is a copy of the hexadecimal dump you obtained.

$mkdira item ab item bc item c
$ ls
(a) Item a b b b c c c c
$ ls-da*
(a) Product (a)
$ ls-da* | od-cx
0000000 a345 223 201 a\n
         61e59381610a
0000006
$ ls-db*
b-eye b
$ ls-db* | od-cx
0000000 b 347 233 256 b\n
         62e79bae620a
0000006
$ ls-dc*
c � high-ranking c
$ ls-dc* | od-cx
0000000 c345223 207 222 233 256 c\n
         63e5 9387 929bae630a00
0000011

This is an additional dump.

$ls
(a) Item a b b b c c c c
$ LANG = Cls | od-tx1
000000061 e593 81610a62 e79bae620a63 e59387
0000020 929bae630a
0000025
$

When I saved and displayed the results of history and ls in a file, the characters were not garbled.

$ls
(a) Item a b b b c c c c
$ echoa item a b b c item c>tmp.txt;cat tmp.txt
Item a. Item b. Item c. Item c.
$ history | grep mkdir
  503 mkdir a item b item b item c item c
  510 history | grep mkdir
$

The file was garbled as well.

$touch Items
$ ls
item of high quality
$

Storing garbled results in a file will be garbled.
This may be a file system problem.

$ls
item of high quality
$ ls>tmp.txt
$ cattmp.txt
tmp.txt
article
eye
� high school
$

shellscript solaris

2022-09-30 21:15

2 Answers

First, make sure that $LANG is the environment variable (export in the shell), that $LC_ALL is not configured, or that the value contains characters that are not visible.

$env|egrep'^(LANG|LC_ALL)='
$ printf'%s' "$LANG" | cat-v
$ printf'%s' "$LC_ALL" | cat-v

This may be a file system issue, so please check the mounting source and options.

$nawk-vm="`df.|sed's/.*//'`"$3==m{print}'/etc/vfstab
$ mount | grep "^`df. | sed's /.*//'`"

That's it for now.It's not clear at this time what the cause is or what the information we've been through so far will tell you.

2022-09-30 21:15

I don't know what the cause is, but if there is a possibility that it is caused by a specific byte, such as ShiftJIS's 5c problem, you can change the file name one byte at a time and observe which byte to change to eliminate garbled characters.

Example:

#utf-8 with only 1 byte different string
Retaining eyes b'\xe6\x93\x81\xe7\x9b\xae'
目eye b'\xe5\x94\x81\xe7\x9b\xae'
目eye b'\xe5\x93\x82\xe7\x9b\xae'
barbarism b'\xe5\x93\x81\xe8\x9b\xae'
item b'\xe5\x93\x81\xe7\x9c\xae'
item b'\xe5\x93\x81\xe7\x9b\xaf'
# 1 byte in sjis is a different string
bark b'\x96i\x96\xda'
b'\x95j\x96\xda' of Akime b'\x95j\x96\xda'
article b'\x95i\x97\xda'
Featured b'\x95i\x96\xdb'
# Only 1 byte in euc-jp is different string
habit b'\xca\xca\xcc\xdc'
b'\xc9\xcb\xcc\xdc' of Akime b'\xc9\xcb'
US>Cultivation b'\xc9\xca\xcd\xdc'
Featured b'\xc9\xca\xcc\xdd'
# Iso-2022-jp with only 1 byte different string
B'\x1b$BJL\\x1b(B')
b'\x1b$BIKL\\x1b(B')
US>Cultivation b'\x1b$BIJM\\x1b(B')
Featured b'\x1b$BIJL]\x1b(B'

Script used to output examples:

#!/usr/bin/python3
origin = "item"
for enc in "utf-8", "sjis", "euc-jp", "iso-2022-jp":
    print("#",enc," only 1 byte different string")
    new_byte_list = list (bytes(orig,enc))
    for i in range (len(new_byte_list)) :
        ifenc=="iso-2022-jp" and (i<3ori>len(new_byte_list)-4):
            continue
        new_byte_list[i]+=1
        b=bytes(new_byte_list)
        print(str(b,enc), "",b)
        new_byte_list[i]-=1

2022-09-30 21:15

If you have any answers or tips

Popular Tags

python x 4647

android x 1593

java x 1494

javascript x 1427

c x 927

c++ x 878

ruby-on-rails x 696

php x 692

python3 x 685

html x 656