whoscored = 'https://www.whoscored.com'
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36',
'referer' : 'http://www.naver.com',
'Content-Type': 'application/json; charset=utf-8'
}
cookies = {'session_id': 'sorryidontcare'}
request = requests.get(searche_url,headers = headers, cookies=cookies)
print request
Results: <Response [403]>
Currently, I'm practicing crawling after writing it like this, but even if I hand over an additional header to the site, 403 errors occur Is there any way to access this situation?
python header error requests
There is an HTTP request test program called Postman. With this, I just sent the address GET without a separate header, cookie, or authentication setting.
GET HTTP/1.1
Host: www.whoscored.com
User-Agent: PostmanRuntime/7.15.2
Accept: */*
Cache-Control: no-cache
Postman-Token: 91164742-d18a-425a-908f-f56386de743d,c7bc4c16-2b3b-49d9-a128-5170200a258e
Host: www.whoscored.com
Accept-Encoding: gzip, deflate
Connection: keep-alive
cache-control: no-cache
Then the site source responds normally...
Maybe you want to log in to the site and "crawl" the address when the session is being created. If so, then you have to go through several steps (request login with ID, PW → parse cookies or authentication tokens in the login request response → burn cookies/authentication tokens when "crawling" where you want) and if you don't have any luck in the process (such as IP checks), you may not be able to do anything at all.
Anyway Before you crawl something, first try to figure out whether you can get a response only with a pure HTTP request as a Postman or something. You have to get the response back to do "crawling".
© 2025 OneMinuteCode. All rights reserved.