Crawling pages requiring login / downloading Selenium files

I want to save the image file by scratching it from the page that I can access only when I log in. I'm trying to use Selenium because of JavaScript, but it seems difficult to save the file using requests because I can't access the image file unless I log in.

Is there a way to save the file using webdriver? Or is there another solution...?

python selenium crawling

2022-09-21 13:54

2 Answers

I found a way and solved it. After logging in with Seleium, you can copy the cookies and send them to the requests.

with requests.Session() as s:
    for cookie in driver.get_cookies():
        s.cookies.set(cookie['name'], cookie['value'])
    with open('img.png', "wb") as file:
        response = requests.get(url, headers=HEADER)
        file.write(response.content)

2022-09-21 13:54

You may need basic knowledge of the web and JavaScript.

First, understand where and how the page sends the login request (onclick, submit, or js processing...) The current page is 99 per cent js.)

You can create a datatype that fits that form (usually in json format) and then create a session that passes the values to that page.

When you create a session, you usually use the requests.Session() in the Request module to create it.

I made it by referring to the link below when coding related to it before. I think it will be very helpful if you look at it, too. https://beomi.github.io/2017/01/20/HowToMakeWebCrawler-With-Login/

2022-09-21 13:54

If you have any answers or tips

Popular Tags

python x 4647

android x 1593

java x 1494

javascript x 1427

c x 927

c++ x 878

ruby-on-rails x 696

php x 692

python3 x 685

html x 656