Data crawling on dynamic pages that you want to learn even if you pay (Python)

Asked 1 years ago, Updated 1 years ago, 85 views

If stock information changes according to the URL when receiving stock price information from KRX or Daum Securities, it is now possible to crawl the information on the URL page by putting the stock code as a variable and calling the web page.

However, among web pages, dynamically working pages

For example, if the URL does not change and you press the "Inquiry" button by specifying a date in the calendar form on the web page, only the information corresponding to the date appears

I succeeded in using Python's "from selenium import webdriver" to display the Chrome browser and click on variables by item, but I would like to ask if there is a way to get data from the web server right away.

Current Method: Run Chrome Browser -> Click Command with selenium -> Load Page -> Read Page Data

How I want to do it: Request variables directly to the web server -> Read data

I want to search for a book or Google for this part, but if you search for "Python, Dynamic Page, Crawling," you can only find the method using selenium shown above, and nothing more.

What should I search for to make it the same way as above? If you have to read a book, please tell me which part I should look at or the title of the book, and I will look for it on my own!

python3.4

2022-09-22 20:05

1 Answers

The problem is one thing.

How do I run javascript to get the results?

Because of this problem, it is inefficient to use selenium to handle heavy browsers directly.

Of course, it is an easy problem if the GET/POST variable is simply changing. But web apps aren't like that these days. It is a format that dynamically receives json, handles objects on javascript, and uses the result to make a server-side call to Ajax.jax.

The problem here is that you have to do javascript, but you can't do it with Python.

Of course there's a way. You use a javascript engine like SPIDER MONKEY to perform javascript. Of course it's literally an engine, not a browser, so there's no such thing as a DOM.

If you want to try it,

https://github.com/doloopwhile/PyExecJS

Please refer to the above project.


2022-09-22 20:05

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.