First of all, the pages you want to scrap/parse are as follows.
Generally, I approached the page as if I were using requests, but it kept dropping 403 forbidden errors. It appears the same even though you have modified and added all the headers that are in use.
I sent requests.get() with the cookie value removed, but when I looked at the network through the browser, the value was lowered in the first HTTP communication sent without the cookie value, so it didn't seem to be validated with the cookie value.
This is the first time in this case, so I'm asking if I can hear the opinions of the masters.
When I looked at the network, the requests header received from the server side was as follows.
If you look at the HTTP preview, you can see that the data was received normally from the browser as shown below.
(This is the first HTTP communication in waterfall)
This is the input value when the above requests header and address are entered through POSTMAN. As you can see, it's being bounced to 403.
If the cookie value is set complicatedly or if you need csrf or JWT, I will try to pierce it somehow, but there is no custom header required, and I am embarrassed because it is the first time that it is bounced like this even though I have matched the request header received from the server.
It's the same even though I opened and sent a session with requests because I thought I needed a session.
Let's wait and see if there's a master who can solve this problem ㅠ<
---- Header Used ---
my_headers = {
'User-Agent' : 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.5 Safari/605.1.15',
'Accept' : 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language' : '[{"key":"Accept-Language","value":"ko-kr","description":"","type":"text","enabled":true}]',
'Connection' : 'keep-alive',
'Host' : 'www.influenster.com',
'Accept-Encoding' : 'gzip, deflate, br'
}
There's already a solution.
pip install cfscrape
https://github.com/Anorov/cloudflare-scrape
import cfscrape
scraper = cfscrape.create_scraper()
r = scraper.get("https://www.influenster.com/reviews/farmacy-honeymoon-glow-aha-resurfacing-night-serum-with-echinacea-greenenvytm")
r.status_code # 200
print(r.content.decode('utf-8'))
© 2024 OneMinuteCode. All rights reserved.