Is there a way to wait for loading when crawling with python requests?

Asked 1 years ago, Updated 1 years ago, 109 views

url = 'www.url-test.co.kr'
req = reuqests.get(url)
json_load = json.loads(req.text)
json_res = json_load['rest']
.
.
.

In this way, a specific site is using requests to retrieve information. To be exact, I get information from the paid API, but it takes some time to load if I enter the API address and access it, maybe because the server is slow or because my internet is slow. So if I find the key value 'test' after request.get(url), it says that I can't find the key value because it's not loaded enough. So, time for the requests.I'm trying to do it as slowly as possible by applying sleep, but I think it's hard to solve the loading delay problem with this Is there a way to wait for loading like selenium and request after the data is completely scattered?

python crawler loading requests

2022-09-22 08:17

1 Answers

Is it the HTML document format that I bring to requests?

Or is it just a data format like json/xml?

Once I saw the question, I think you're making a crawler that scratches the contents of the general page itself (html document).

if that's the case There is a high possibility that the page is being sprayed with ajax data that is performed by JavaScript after the page is rendered or rendered. In this situation, sleep will never solve it.

A typical crawl returns only documents that are answered by the server as strings, but it does not actually parse the strings to create DOM Elements or run JavaScript. That's what browsers do.

Even if beautifulsoup allows you to parse html documents, JavaScript won't run, so you won't see any additional data calling JavaScript.

You'll either have to use a selenium that works like a real browser, or you'll have to use a library that includes a different headless browser.


2022-09-22 08:17

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.