Which one should I use, Python crawl?

Asked 2 years ago, Updated 2 years ago, 121 views

Hi, everyone. I am mainly developing the web with PHP. We are planning to develop a crawling program in which the necessary functions are collected in relation to orders/queries/claims on the seller's page of the shopping mall or updated to each shopping mall collectively by sending an invoice number.

Currently, I have implemented it using selenium, chrome web driver, and wave width, but I think it would be better to find another way because of various problems, so I am leaving a question.

First of all, the hardest part is collecting order data. It wasn't hard to scratch and parse what was shown. (Whether it's selenium or bs...) But it's only a small part of what you see, and for more information, you can usually click on the order number The dynamically generated (Ajax) content had to be parsed to get the rest of the information. But this way is too slow. I have to click on the order number one by one every time.

The user cannot stop controlling the browser with selenium, and any variable (layer pop-up) appears on the site. If you are asked to change your password after logging in, you will get an error.

It was difficult to download the file with bs. Usually, when you try to download data from a shopping mall manager to Excel, you create and download an Excel file dynamically, but when you implement it with PHP, do you usually define a header so that the browser downloads automatically? Maybe it all seems to be handled that way.

Selenium can download Excel, but it is slow overall because it controls the browser itself. Also, if there are parts that are not implemented as code that occurs on the site, it will print an error without fail.

bs could not download dynamically generated files. For Coupang, the path to the order Excel file created is different each time. It was so hard for me to dig this path that I gave up halfway ㅠ.ㅠ

What's left is a scrap, but it doesn't look much different from the bs either. How do I usually implement it?

I need your help. ㅠ.ㅠ I'm learning Python with my feet.

You have to develop this somehow to make a living. Please. I'm desperate. (__)

python crawling

2022-09-22 19:20

1 Answers

I think we can create the "Coupang Order Excel File" with Coupang official API Auction and 11st Street also have an official API, so why don't you consider them as much as possible? As far as I know crawling should always be the last worst choice. As if exec() is the last choice for some PHP problems.


2022-09-22 19:20

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.