['01', '15 Good Goods \xa0\xa0\xa0\xa0\xa0\xa0', '034252340', '96,000', '814,000', '\n\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa01, '01, '01, '01, '01, '01, '01, '0
['']
['02', '16 Recommendations \xa0\xa0\xa0\xa0', '\n', '0342742342', '96,000', '814,000, '\n\xa0\xa00\xa0\n\xa01\xa0\xa0\xa0\xa0\xa03,\xa0\xa0\xa0\xa0\n\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa
Process finished with exit code 0
I've been scratching one site and the result comes out weird like above.
01, 15 Good Product SLEFT, 034252340, 96,000, 814,000
At least it should come out like this
When I searched on Google, it said it was a Unicode problem
I can't solve it even if I try the RE regular expression.
# Make an episode list >> This is to spray each line around it at the final output.
episodes = []
# Tags containing the contents of the episode are extracted.
table = bs.find('table', class_='td00')
for row in table.find_all('tr'):
values = []
for col in row.find_all('td'):
text = col.get_text()
rp_text = text.replace('(?<!\x0d)\x0a',' ')
values.append(rp_text)
if values:
episodes.append(values)
del episodes[0]
for episode in episodes:
print(episode)
Please give us a hint.
python
from unicodedata import normalize
s = '16 Recommended Product \xa0\xa0\xa0\xa0'
normalize('NFKD', s)
© 2024 OneMinuteCode. All rights reserved.