https://live23.5ch.net/test/read.cgi/livetbs/1220170942/
I'd like to scrap this url reply, but the following code will cause garbled characters.
res=requests.get("https://live23.5ch.net/test/read.cgi/livetbs/1220170942/")
soup = BeautifulSoup(res.text, 'lxml')
threadRes=soup.find_all('dd')
print(threadRes) = > garbled characters
Also, if the first argument in the second line is res.content, the garbled characters will be fixed, but all replies will not be scraped.
(This url has 1001 replies, but only 223 replies)
res=requests.get("https://live23.5ch.net/test/read.cgi/livetbs/1220170942/")
soup = BeautifulSoup(res.content, 'lxml')
print(soup)
threadRes=soup.find_all('dd')
print(len(threadRes))=>223
How can I correct garbled characters and scribble all replies?
python web-scraping beautifulsoup
In my environment, I also garbled using res.encoding=res.apparent_encoding
as one of the answer of the question comment, but I have verified that res.encoding="shift_jis"
can display it correctly.
import requests
from bs4 import BeautifulSoup
res=requests.get("https://live23.5ch.net/test/read.cgi/livetbs/1220170942/")
# res.encoding = res.apparent_encoding
res.encoding="shift_jis"
soup = BeautifulSoup(res.text, 'lxml')
threadRes=soup.find_all('dd')
print(threadRes)
print(len(threadRes))#1001
References
© 2025 OneMinuteCode. All rights reserved.