Hello
I'm trying to read shift jis win csv mixed with Chinese and Japanese as below in php, but only Chinese is translated, how can I read it with utf-8?
[FYI] csv itself has to be saved as shift jis win due to project reasons, so I want to convert shift jis win->utf-8.
■csv
word, Chinese translation
Hi, Nice to meet you
Thank you, 谢啦谢
■ csv hex dump
$hexdump-x./test.csv
0000000 be8c7497922c8d868c9196ea0af3b182
0000010f182 c982 bf82 cd823f2c448d 820a82a0
000002082e882aa82c62ca43f3f0a3f
000002c
$xxd./test.csv
00000000: 8cbe97742c92868d 918cea96f30a82b1 ...t,........
00000010—82f182c982bf82cd2c3f8d440a82a082........,?.D....
00000020: e882 aa82 c682 a42c3f3f3f0a........,???
■ Source
<?php
if(($handle=fopen($argv[1], "r")))!==FALSE){
while(($data=fgetcsv($handle))){
foreach($data as$value){
mb_convert_variables('utf-8', 'sjis-win', $value);
echo "${value},";
}
echo "\n";
}
fclose($handle);
}
■ Output
word, Chinese translation,
Hi, how are you? Good.
Thank you, ???,
Why don't you put a BOM on the CSV?
<?php
$fp = fopen('test.csv', 'wb');
$list = [
["Words", "Chinese translation"]
["Hello", "I like you"]
['Thank you', '谢啦谢']
];
// Prepend BOM
fputs($fp,$bom=(chr(0xEF).chr(0xBB).chr(0xBF)));
foreach($list as$fields){
fputcsv($fp,$fields);
}
fclose($fp);
<?php
$fp = fopen('test.csv', 'r');
$bom=chr(0xEF).chr(0xBB).chr(0xBF);
if($fp!==FALSE){
$encoding='UTF-8';
if(fgets($fp,4)!==$bom){
refresh($fp);
$encoding='SJIS-WIN';
}
while($data=fgetcsv($fp))!==FALSE){
foreach($data as$value){
mb_convert_variables('UTF-8', $encoding, $value);
echo "${value},";
}
echo "\n";
}
fclose($fp);
}
Shift_JIS (and CP932, which includes extensions by MS) does not contain Chinese characters such as 」, 」, and 」.Therefore, there is no Shift_JIS CSV file containing at least these characters.
As far as the hexadecimal dump is concerned, all characters that cannot be represented as Shift_JIS have been converted to literal ?(3f)
at the CSV stage, and the original Chinese information has already been lost.So you'll need to rethink how to save it to CSV.
The reason why CSVs need to be saved and they are tied to Shift_JIS is that most of them want to view/check CSVs stored in Excel.Excel can read UTF-8 CSV files correctly only if a BOM is granted.
The code above is a PHP copy of that Excel habit.Make the input code compatible with UTF-8 with BOM so that you can read the existing Shift_JIS CSV for compatibility.
578 Understanding How to Configure Google API Key
618 Uncaught (inpromise) Error on Electron: An object could not be cloned
572 rails db:create error: Could not find mysql2-0.5.4 in any of the sources
574 Who developed the "avformat-59.dll" that comes with FFmpeg?
914 When building Fast API+Uvicorn environment with PyInstaller, console=False results in an error
© 2024 OneMinuteCode. All rights reserved.