Log in to Python and download files

I'd like to receive paid materials after accessing the paid site. It's too...It's annoying, so I'm going to get all of them through Python.

I'm asking you a question because I've tried to copy someone from a foreign country, but there's a traffic jam.

How should I modify the code below?

from urllib import request
import requests
from bs4 import BeautifulSoup
import parser

myheaders = {'User-Agent': 'mozilla/5.0 (windows nt 10.0; win64; x64; rv:76.0) gecko/20100101 firefox/76.0'}

with requests.Session() as s:
    url = 'http://www.exam4you.com'
    r = s.get(url, headers =myheaders)
    soup = BeautifulSoup(r.content, 'html.parser')

    login_data['u1'] = soup.find('input', attrs = {'name': 'u1'})['value'] <--- This part is a bit blocked. 

    login_data = {'userId': '?????', <-- These are the values received from the network after logging in, but I don't know what ts, wu1 is                    
                                                                        I will...
                  'passwd': '??????????',
                  'ts': '1589378756791'                
                   'u1': '08c8a162ba73af9e033d096b596463e1'
                  }



    r = s.post(url, data=login_data, headers = myheaders)
    print(r.content)

python

2022-09-20 22:30

1 Answers

Let's go back to the drawing board. How do I receive files in batches?

1. If you are lucky, the file names are

http://foo.com/files/1.zip
http://foo.com/files/2.zip
http://foo.com/files/3.zip
...

It's in the format, and if it's downloaded as soon as you log on, you can just change that number and get it. Because the only consistent way to get those file addresses is to change numbers.

2. But what if:

http://foo.com/files/ekrxfgbuiy.zip
http://foo.com/files/q324597ybnzfkdj.zip
http://foo.com/files/234.zip
...

Then... We need to find a consistent way to get those file addresses. Let's say there are URLs like this.

http://foo.com/index.php?board=yourFile&id=1
http://foo.com/index.php?board=yourFile&id=2
http://foo.com/index.php?board=yourFile&id=3

And let's say that all of these URLs have these <button> as well.

<Button click="startDownload('ekrxfgbuiy')">Download</button>

Then, I think we can use Beautiful Soup. All you have to do is change the number of URLs, and you just have to squeeze the contents of each URL with a beautiful soup, turn .find() and get the value in startDownload() anyway.

3. But what should I do if I access that URL or file path and it goes to the login window? There's another barrier in Stage Two. If you remove this barrier, it'll be the same as Step 2.

Let's do it in a normal way. You're really logging in. However, at this point, you turn on the browser developer tool to investigate what is transmitted to the network, and what authentication cookies or session values are used. For example, let's say that if you're lucky enough to have a certain value in the token value in the request header, you'll definitely succeed in logging in. Then it's not hard to talk about. When you do .get(url, headers), just specify {token: "some value"} in headers because the rest is the same as Step 2.

It's not something that can be done by fixing a few lines of code, but you have to analyze and investigate the site itself and pierce it. This is my first visit to the exam4u.com site in my life. I don't know what the login process is, what the user authentication is, and what a consistent way to get the files you want. Perhaps you know better about that.

In that sense, let's go back to the drawing board.

2022-09-20 22:30

If you have any answers or tips