Hello, I have a question.
If there is a special character """ when saving to Excel after crawling, there is an error when encoding euc-kr, so it cannot be saved to Excel.
The problem with the current code is
If the option has a value of "+3,000"" among the parts that check the option of the product, I want to replace 를 with a space, but it can't be erased with replace.
...
omitted...
option_chk2 = option_chk.select('a[rel]')
print(option_chk2)
# Print result:
#<a class="sbFocus" href="#"rel=">- Select option -</a>, <a href="#Medium" rel="Medium">Medium</a>, <Href="#Large"rel>"/a>"3; Large"Large"Large"/a> Large"Large"Large">"
a = ' + '.join([i.string for i in option_chk2])
b = a.replace('₩₩','')
if not a:
a = "Sold Out"
print(a)
#print Results
#'- Select Options - + Medium + Large (+3,000)')'
print(b)
#'- Select Options - + Medium + Large (+3,000)')'
This is the situation above.
Python recognizes reverse slash """ as a special character ₩₩ I searched twice and found it, but it doesn't apply.
The test below works well.
import re
char="₩₩"
string = ("3,000"+char)
print(string)
#Result 3,000₩
string2 = string.replace("₩₩","")
print(string2)
#Result 3,000
The reverse slash was marked \ in the hash code questionnaire, so I arbitrarily converted it into a special character 로.
If you had a similar problem, please share your knowledge.
Thank you.
python crawling
Um...
Isn't the problem caused by the difference between reverse slash and original characters?
>>> s = '\u20a9'
>>> s
'₩'
>>> print(s)
₩
>>> s.encode("cp949")
Traceback (most recent call last):
File "<pyshell#3>", line 1, in <module>
s.encode("cp949")
UnicodeEncodeError: 'cp949' codec can't encode character '\u20a9' in position 0: illegal multibyte sequence
>>> s2 = '\\'
>>> s2
'\\'
>>> print(s2)
\
>>> ord(s2)
92
>>> ord(s)
8361
>>> hex(ord(s2))
'0x5c'
>>> hex(ord(s))
'0x20a9'
>>> s2.encode("cp949")
b'\\'
>>> import re
>>>
>>> s1 = "3000 \u20a9"
>>> s2 = "3000 \u005c"
>>> s1
'3000 ₩'
>>> s2
'3000 \\'
>>> print(s1, s2)
3000 ₩ 3000 \
>>>
>>> _s1 = s1.replace("₩", "")
>>> _s1
'3000 '
>>> __s1 = s1.replace("\\", "")
>>> __s1
'3000 ₩'
>>>
>>> _s2 = s2.replace("₩", "")
>>> _s2
'3000 \\'
>>> __s2 = s2.replace("\\", "")
>>> __s2
'3000 '
© 2024 OneMinuteCode. All rights reserved.