I am currently learning about bytes and file processing.
#Hangul is 2 bytes per character (other is 1 byte) If it is 5 letters, it is 10 bytes, and 'Studying Python Language' is 10 letters (including spacing), so fd.read (10)
#filename.look(0) -> first part of the file, filename.look(1) -> current part of the file, filename.look(2) -> last part of the file
#1
filename ='100_text.txt'
with open(filename, encoding='utf-8-sig') as fd:
fd.read(10)
a = fd.tell()
a
#--------------------------------------------------------------------------------
#2
with open(filename, 'rb') as fd:
fd.read(10)
fd.tell()
b = fd.tell()
b
#Result value --------------------------------------------------------------------
fd.read(10) of 'Studying Python Language' #1
26 #1 of a
fd.read of b'\xed\x8c\x8c\xec\x9d\xb4\x8d\xac' #2
fd.tell() of 10 #2
fd.tell() of 10 #2
I know that Hangul is 2 bytes per letter, and English, spacing, symbols, etc. are 1 byte.
Reading files works like this in the read method, read
fd.read (10) is 10 bytes, so shouldn't you bring Python?(It's 9 bytes, but how do you match 10 bytes in this case?)
And if you use fd.tell, you'll get the byte value of the current location, but if you look at the value that fd.tell gets in #1, you'll get 26 bytes.
If it's 2 bytes per character, shouldn't the value be 18?
Since I read it in rb, I read it in bye.
fd.read (10) is 10 bytes, so shouldn't you bring only 10 letters? I wonder why the price is so long.
I'm sorry that there are many questions and it's not organized because it's my first time learning,
I'd really appreciate it if you could answer my questions! python byte utf-8
You are mistaken.
utf-8 encoded Hangul is 3 bytes in Korean characters.
cp949, euc-kr, etc. have 2 bytes of Korean characters. English lamp is 1 byte.
© 2024 OneMinuteCode. All rights reserved.