What should I do if 403 errors occur even if I add headers when using Python requests?

Asked 2 years ago, Updated 2 years ago, 95 views

whoscored = 'https://www.whoscored.com'
headers = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36',
    'referer' : 'http://www.naver.com',
    'Content-Type': 'application/json; charset=utf-8'
    }
cookies = {'session_id': 'sorryidontcare'}
request = requests.get(searche_url,headers = headers, cookies=cookies)
print request

Results: <Response [403]>

Currently, I'm practicing crawling after writing it like this, but even if I hand over an additional header to the site, 403 errors occur Is there any way to access this situation?

python header error requests

2022-09-21 16:18

1 Answers

There is an HTTP request test program called Postman. With this, I just sent the address GET without a separate header, cookie, or authentication setting.

GET  HTTP/1.1
Host: www.whoscored.com
User-Agent: PostmanRuntime/7.15.2
Accept: */*
Cache-Control: no-cache
Postman-Token: 91164742-d18a-425a-908f-f56386de743d,c7bc4c16-2b3b-49d9-a128-5170200a258e
Host: www.whoscored.com
Accept-Encoding: gzip, deflate
Connection: keep-alive
cache-control: no-cache

Then the site source responds normally...

Maybe you want to log in to the site and "crawl" the address when the session is being created. If so, then you have to go through several steps (request login with ID, PW → parse cookies or authentication tokens in the login request response → burn cookies/authentication tokens when "crawling" where you want) and if you don't have any luck in the process (such as IP checks), you may not be able to do anything at all.

Anyway Before you crawl something, first try to figure out whether you can get a response only with a pure HTTP request as a Postman or something. You have to get the response back to do "crawling".


2022-09-21 16:18

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.