whoscored = 'https://www.whoscored.com'
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36',
'referer' : 'http://www.naver.com',
'Content-Type': 'application/json; charset=utf-8'
}
cookies = {'session_id': 'sorryidontcare'}
request = requests.get(searche_url,headers = headers, cookies=cookies)
print request
Results: <Response [403]>
Currently, I'm practicing crawling after writing it like this, but even if I hand over an additional header to the site, 403 errors occur Is there any way to access this situation?
python header error requests
There is an HTTP request test program called Postman. With this, I just sent the address GET without a separate header, cookie, or authentication setting.
GET HTTP/1.1
Host: www.whoscored.com
User-Agent: PostmanRuntime/7.15.2
Accept: */*
Cache-Control: no-cache
Postman-Token: 91164742-d18a-425a-908f-f56386de743d,c7bc4c16-2b3b-49d9-a128-5170200a258e
Host: www.whoscored.com
Accept-Encoding: gzip, deflate
Connection: keep-alive
cache-control: no-cache
Then the site source responds normally...
Maybe you want to log in to the site and "crawl" the address when the session is being created. If so, then you have to go through several steps (request login with ID, PW → parse cookies or authentication tokens in the login request response → burn cookies/authentication tokens when "crawling" where you want) and if you don't have any luck in the process (such as IP checks), you may not be able to do anything at all.
Anyway Before you crawl something, first try to figure out whether you can get a response only with a pure HTTP request as a Postman or something. You have to get the response back to do "crawling".
574 Who developed the "avformat-59.dll" that comes with FFmpeg?
912 When building Fast API+Uvicorn environment with PyInstaller, console=False results in an error
617 Uncaught (inpromise) Error on Electron: An object could not be cloned
572 rails db:create error: Could not find mysql2-0.5.4 in any of the sources
© 2024 OneMinuteCode. All rights reserved.