If you open the [utf-8] file and write or append and save it, it will be recognized as an [ascii] file.

Asked 2 years ago, Updated 2 years ago, 155 views

If you open the [utf-8] file, replace it, and save it (w,a), it recognizes it as an [ascii] file. How do I save it so that it is recognized as utf-8 and not changed to be recognized as ascii? This file is for multilingual translation, so it should not be recognized as ascii.

If you save it by designating it as encoding='UTF-8-SIG', fortunately, it is stored to be recognized as UTF-8-SIG It doesn't work with encoding='utf-8'. (It's changed to ascii and saved.))

src1 = "’"
tar1 = "'"
        with open(filepath, "r", encoding='utf-8') as file:
            content = file.read()
            content = content.replace(src1, tar1)
        with open(filepath2, "a", encoding='UTF-8') as file:
            file.write(content)

Codecs.open / io.open. I tried it just in case, but there is no change in the movement.

python utf-8 ascii encoding

2022-09-20 11:35

1 Answers

UTF-8 has the same content as ASCII if it consists of only alphabets.

In other words, there is no way to know if this is ASCII or UTF-8 when you open a file that consists only of alphabets. If characters from other languages are included, it is possible to distinguish them by checking the unique pattern of UTF-8.

In the case of utf-8-sig, it will be possible to automatically recognize it as you said by adding a Byte Order Mark (BOM) to the front of the text file to express that it is written in UTF-8.

I'm not sure how to determine whether the file is ASCII or not, but try writing it in UTF-8 with Korean and see if it's ASCII.


2022-09-20 11:35

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.