Python Suddenly Web Scraping Site Rejects

Asked 2 years ago, Updated 2 years ago, 441 views

Hello.

I'm inquiring because there was a sudden denial of access during scraping of the stock site.

The error code is as follows. requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://www.investing.com/equities/alibaba

When I googled, most of them only say to set the User Agent in the headers to avoid robot detection. That part has already been applied to the code, and it's been crawling well, but it's been rejected from one day.

First, I tried fake user agent and random user agent to send user-agent to random, but it is always rejected I tried to change the IP using VPN to see if it was an IP rejection, but it is still being rejected. If it is rejected through IP information, it should be rejected even during normal connection, not crawling, but it seems to be not IP because it works well during normal connection.

The last thing I found was that when I accessed the Refer Control, it could be detected by a robot if there was no information on where it came from. So I installed the Refer Control extension program, but I can't find any information on how to use it. Haha

The code is as follows.

url = "https://www.investing.com/equities/alibaba"

ua = generate_user_agent()

headers = {"User-agent":ua}

res = requests.get(url,headers=headers)

print("Answer Code:", res.status_code)

res.raise_for_status()

print ("Start")

There is no problem with other sites, and 403 Client Error occurs only on the investing.com site where you have been scraping.

Please help me.

python beautifulsoup scraping crawling

2022-11-04 00:00

1 Answers

I found it while searching for the Naver login capture, but try using Selenium + Debugger Chrome. It's a real Chrome browser, so the Naver login maintenance function is also applied.

chromeProcess=subprocess.Popen(r'C:\Program Files\Google\Chrome\Application\chrome.exe --remote-debugging-port=9222 --user-data-dir="C:\ChromeProgram Files\Google\Chrome.exe"PIPE, stdout=subprocess.PIPE)
option = Options()        
option.add_experimental_option("debuggerAddress", "127.0.0.1:9222")
driver = webdriver.Chrome("chromedriver", options=option)

#This is like using selenium.

ChromeProcess.terminate() #Close Chrome Browser


2022-11-04 00:00

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.