Convenience store - Dynamic page Python crawling? Multiple pages - How to load multiple classes of information!!! I've been trying to teach myself for two weeks, but I can'tㅜㅜ

Asked 2 years ago, Updated 2 years ago, 107 views

I'm crawling the list of convenience store stores.

Among them, I'd like to crawl the list of search results that apply conditions such as Seoul Metropolitan Government and Lotto sales!

I crawled without any conditions

As a result, there is only one page

I wonder how to do the rest of the pages!

>>> from bs4 import BeautifulSoup
>>> from selenium import webdriver
>>> driver = webdriver.Chrome(''chromedriver.exe')
>>> driver.get('http://cu.bgfretail.com/store/list.do?category=store')
>>> html = driver.page_source
>>> soup = BeautifulSoup(html, 'html.parser')
>>> nameList = soup.findAll("span", {"class":"name"})
>>> for name in nameList:
    print(name.get_text())
<Result>
1st Batter Daewoo Commercial Car Credit Union
a two-stroke credit union
419 Intersection Point
63 Building Branch
6 Industrial Complex 1st store
>>> 

Question: How do I get the store name, phone number, and address on several pages at once?

The information I want to get is the store name, phone number, and address!

Originally, I only played Java, but there was a limit, so I tried Python for the first time, and I looked at several documents for two weeks, but I couldn't solve it Please help me, mastersㅜㅜ/

python crawling

2022-09-22 19:10

1 Answers

In fact, if it's for selenium, click on the listing number below.

You can search it as an event.

But you don't have to work heavily on the site of the question using selenium.

The address to receive the actual results is as follows.

http://cu.bgfretail.com/store/list_Ajax.do?pageIndex=2&listType=&jumpoCode=&jumpoLotto=&jumpoToto=&jumpoCash=&jumpoHour=&jumpoCafe=&jumpoDelivery=&jumpoBakery=&jumpoFry=&jumpoAdderss=&jumpoSido=&jumpoGugun=&jumpodong=&user_id=&sido=&Gugun=&jumpoName=

However, if you just paste the address into your browser, it will appear as no result.

The reason seems to be checking some values in the header.

Typically, it seems to check values such as referrer and host.

When calling url, call by setting the cookie value below, and call by changing only the pageIndex=2 value in url to call, the result is queried.

Host:cu.bgfretail.com
Connection:keep-alive
Content-Length:207
Accept:text/html, */*; q=0.01
Origin:http://cu.bgfretail.com
X-Requested-With:XMLHttpRequest
User-Agent:Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36
Content-Type:application/x-www-form-urlencoded; charset=UTF-8
Referer:http://cu.bgfretail.com/store/list.do?category=store


2022-09-22 19:10

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.