What should I do if 403 errors occur even if I add headers when using Python requests?

whoscored = 'https://www.whoscored.com'
headers = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36',
    'referer' : 'http://www.naver.com',
    'Content-Type': 'application/json; charset=utf-8'
    }
cookies = {'session_id': 'sorryidontcare'}
request = requests.get(searche_url,headers = headers, cookies=cookies)
print request

Results: <Response [403]>

Currently, I'm practicing crawling after writing it like this, but even if I hand over an additional header to the site, 403 errors occur Is there any way to access this situation?

python header error requests

2022-09-21 16:18

1 Answers

There is an HTTP request test program called Postman. With this, I just sent the address GET without a separate header, cookie, or authentication setting.

GET  HTTP/1.1
Host: www.whoscored.com
User-Agent: PostmanRuntime/7.15.2
Accept: */*
Cache-Control: no-cache
Postman-Token: 91164742-d18a-425a-908f-f56386de743d,c7bc4c16-2b3b-49d9-a128-5170200a258e
Host: www.whoscored.com
Accept-Encoding: gzip, deflate
Connection: keep-alive
cache-control: no-cache

Then the site source responds normally...

Maybe you want to log in to the site and "crawl" the address when the session is being created. If so, then you have to go through several steps (request login with ID, PW → parse cookies or authentication tokens in the login request response → burn cookies/authentication tokens when "crawling" where you want) and if you don't have any luck in the process (such as IP checks), you may not be able to do anything at all.

Anyway Before you crawl something, first try to figure out whether you can get a response only with a pure HTTP request as a Postman or something. You have to get the response back to do "crawling".

2022-09-21 16:18

If you have any answers or tips

Popular Tags

python x 4647

android x 1593

java x 1494

javascript x 1427

c x 927

c++ x 878

ruby-on-rails x 696

php x 692

python3 x 685

html x 656