https://live23.5ch.net/test/read.cgi/livetbs/1220170942/
I'd like to scrap this url reply, but the following code will cause garbled characters.
res=requests.get("https://live23.5ch.net/test/read.cgi/livetbs/1220170942/")
soup = BeautifulSoup(res.text, 'lxml')
threadRes=soup.find_all('dd')
print(threadRes) = > garbled characters
Also, if the first argument in the second line is res.content, the garbled characters will be fixed, but all replies will not be scraped.
(This url has 1001 replies, but only 223 replies)
res=requests.get("https://live23.5ch.net/test/read.cgi/livetbs/1220170942/")
soup = BeautifulSoup(res.content, 'lxml')
print(soup)
threadRes=soup.find_all('dd')
print(len(threadRes))=>223
How can I correct garbled characters and scribble all replies?
python web-scraping beautifulsoup
In my environment, I also garbled using res.encoding=res.apparent_encoding
as one of the answer of the question comment, but I have verified that res.encoding="shift_jis"
can display it correctly.
import requests
from bs4 import BeautifulSoup
res=requests.get("https://live23.5ch.net/test/read.cgi/livetbs/1220170942/")
# res.encoding = res.apparent_encoding
res.encoding="shift_jis"
soup = BeautifulSoup(res.text, 'lxml')
threadRes=soup.find_all('dd')
print(threadRes)
print(len(threadRes))#1001
References
572 Who developed the "avformat-59.dll" that comes with FFmpeg?
910 When building Fast API+Uvicorn environment with PyInstaller, console=False results in an error
609 GDB gets version error when attempting to debug with the Presense SDK (IDE)
571 rails db:create error: Could not find mysql2-0.5.4 in any of the sources
© 2024 OneMinuteCode. All rights reserved.