I would like to ask you a question about the problem that 403 forbiden appears when importing data into python requests

First of all, the pages you want to scrap/parse are as follows.

https://www.influenster.com/reviews/farmacy-honeymoon-glow-aha-resurfacing-night-serum-with-echinacea-greenenvytm

Generally, I approached the page as if I were using requests, but it kept dropping 403 forbidden errors. It appears the same even though you have modified and added all the headers that are in use.

I sent requests.get() with the cookie value removed, but when I looked at the network through the browser, the value was lowered in the first HTTP communication sent without the cookie value, so it didn't seem to be validated with the cookie value.

This is the first time in this case, so I'm asking if I can hear the opinions of the masters.

When I looked at the network, the requests header received from the server side was as follows.

If you look at the HTTP preview, you can see that the data was received normally from the browser as shown below. (This is the first HTTP communication in waterfall)

This is the input value when the above requests header and address are entered through POSTMAN. As you can see, it's being bounced to 403.

If the cookie value is set complicatedly or if you need csrf or JWT, I will try to pierce it somehow, but there is no custom header required, and I am embarrassed because it is the first time that it is bounced like this even though I have matched the request header received from the server.

It's the same even though I opened and sent a session with requests because I thought I needed a session.

Let's wait and see if there's a master who can solve this problem ㅠ<

---- Header Used ---

my_headers = {
    'User-Agent' : 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.5 Safari/605.1.15',
    'Accept' : 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
    'Accept-Language' : '[{"key":"Accept-Language","value":"ko-kr","description":"","type":"text","enabled":true}]',
    'Connection' : 'keep-alive',
    'Host' : 'www.influenster.com',
    'Accept-Encoding' : 'gzip, deflate, br'
}

requests python http

2022-09-21 14:01

1 Answers

There's already a solution.

pip install cfscrape

https://github.com/Anorov/cloudflare-scrape

import cfscrape

scraper = cfscrape.create_scraper()
r = scraper.get("https://www.influenster.com/reviews/farmacy-honeymoon-glow-aha-resurfacing-night-serum-with-echinacea-greenenvytm")
r.status_code  # 200

print(r.content.decode('utf-8'))

2022-09-21 14:01

If you have any answers or tips

Popular Tags

python x 4647

android x 1593

java x 1494

javascript x 1427

c x 927

c++ x 878

ruby-on-rails x 696

php x 692

python3 x 685

html x 656