How to read csv (shift jis win) mixed with Chinese and Japanese in php

Asked 2 years ago, Updated 2 years ago, 79 views

Hello
I'm trying to read shift jis win csv mixed with Chinese and Japanese as below in php, but only Chinese is translated, how can I read it with utf-8?

[FYI] csv itself has to be saved as shift jis win due to project reasons, so I want to convert shift jis win->utf-8.

■csv

word, Chinese translation
Hi, Nice to meet you
Thank you, 谢啦谢

■ csv hex dump

$hexdump-x./test.csv
0000000 be8c7497922c8d868c9196ea0af3b182
0000010f182 c982 bf82 cd823f2c448d 820a82a0
000002082e882aa82c62ca43f3f0a3f
000002c
$xxd./test.csv
00000000: 8cbe97742c92868d 918cea96f30a82b1 ...t,........
00000010—82f182c982bf82cd2c3f8d440a82a082........,?.D....
00000020: e882 aa82 c682 a42c3f3f3f0a........,???

■ Source

<?php
if(($handle=fopen($argv[1], "r")))!==FALSE){
    while(($data=fgetcsv($handle))){
        foreach($data as$value){
            mb_convert_variables('utf-8', 'sjis-win', $value);
            echo "${value},";
        }
        echo "\n";
    }
    fclose($handle);
}

■ Output

word, Chinese translation,
Hi, how are you? Good.
Thank you, ???,

php csv shift-jis locale

2022-09-30 11:33

1 Answers

Why don't you put a BOM on the CSV?

CSV output code

<?php

$fp = fopen('test.csv', 'wb');

$list = [
    ["Words", "Chinese translation"]
    ["Hello", "I like you"]
    ['Thank you', '谢啦谢']
];

// Prepend BOM
fputs($fp,$bom=(chr(0xEF).chr(0xBB).chr(0xBF)));

foreach($list as$fields){
    fputcsv($fp,$fields);
}

fclose($fp);

CSV input code

<?php

$fp = fopen('test.csv', 'r');
$bom=chr(0xEF).chr(0xBB).chr(0xBF);

if($fp!==FALSE){
    $encoding='UTF-8';
    if(fgets($fp,4)!==$bom){
        refresh($fp);
        $encoding='SJIS-WIN';
    }

    while($data=fgetcsv($fp))!==FALSE){
        foreach($data as$value){
            mb_convert_variables('UTF-8', $encoding, $value);
            echo "${value},";
        }
        echo "\n";
    }

    fclose($fp);
}

Shift_JIS (and CP932, which includes extensions by MS) does not contain Chinese characters such as 」, 」, and 」.Therefore, there is no Shift_JIS CSV file containing at least these characters.

As far as the hexadecimal dump is concerned, all characters that cannot be represented as Shift_JIS have been converted to literal ?(3f) at the CSV stage, and the original Chinese information has already been lost.So you'll need to rethink how to save it to CSV.

The reason why CSVs need to be saved and they are tied to Shift_JIS is that most of them want to view/check CSVs stored in Excel.Excel can read UTF-8 CSV files correctly only if a BOM is granted.

The code above is a PHP copy of that Excel habit.Make the input code compatible with UTF-8 with BOM so that you can read the existing Shift_JIS CSV for compatibility.


2022-09-30 11:33

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.