Python bs4 and requests library dynamic page crawling question!

Asked 2 years ago, Updated 2 years ago, 93 views

I am a beginner who has studied Python for a month. Using requests and Beautiful Soup libraries, Coupang, Wemakeprice, and Timon sites search for products in the lowest price order and are tasking to make the product name and price in dictionary form, but Coupang and Wemakeprice are easily finished, so Timon cannot use Coupang and Wemakeprice code; When I searched, it said that I can use the framework called selenium, but the tutor told me to solve it with beauty soup and requests instead of selenium, but I don't know how.

f = open('tmon.html', 'wb')
        f.write(req.content) 
        f.close() 

I saved it as the code above and checked, and there was no code related to the product list I searched for There is an error message code that says that the search results cannot be retrieved normally because the service is not available, but how do I crawl the dynamic page only with bs4 and requests? The way I thought about it is not to receive a request right away, but to receive it after a request (time when the product list is loaded). I thought so, so I gave an additional timeout=5 when requesting a request like the code below, but the result value is no different. How should we approach it?

 req = requests.get("http://search.tmon.co.kr/search/?keyword=%s&thr=ts#low-order" % (keyword), headers=self.headers, timeout=5)

The code below is the entire code I wrote.

import requests
from bs4 import BeautifulSoup

class tmon(object):
    Headers = {} #It is used as a global variable because it must be used in all functions that exist in the class, not only in search functions, and it is used as a global variable, leaving the value empty because users can use a browser other than Chrome.

    def__init__(self, heads={}): Set #headers as a parameter value to receive input.
        if not headers:
            self.headers={'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36'}
        else:
            self.headers = headers
        #If you do not enter a value for headers when creating a class, set it to the default value, and if you enter a value for headers, put the value in headers

    def search (self, keyword): When calling # function, input the keyword as a parameter.
        product_list = [] #Generate an empty array variable to enter the result value.
        req = requests.get("http://search.tmon.co.kr/search/?Keyword=%s&thr=ts#Low order"% (keyword), headers=self.headers) #Tmon search box is received in a request get method and the user agent value existing in headers is replaced.

        content = req.content #Only the content of the value imported by the get method is included in the variable.

        soup = Beautiful Soup (content, 'lxml') #Convert the content into a beautiful soup pro lxml form and put it in a variable.

        product = soup.find("ul", {'class' : 'list'}).find_all("li") #Find the tag containing the name and price of the searched product.

        For i in product: The price and product name are stored as different tags in the #li tag, so find the value in the for statement.
            proName= i.find("strong", {'class' : 'tx'}).Text #Only the text of the product name is included in the variable.
            proPrice = i.find("i", {'class' : 'num'}).Text #Only the text of the product price is included in the variable.
            Since proPrice = int(proPrice.replace("", "") #price is in the form of a string, the is removed as a replacement and converted into an integer.

            result = {'proName' : proName, 'proPrice' : proPrice} #In the form of a dictionary, the product name and price contained in the variable are included in one variable.
            product_list.append(result) #Enter the value in dictionary form in the empty array variable that contains the result values made above.

        Return product_list #Return the list containing product information.
if __name__ == "__main__":
    test = tmon()
    print (test.search ("action cam")

python crawling

2022-09-22 18:15

1 Answers

This is the address where the link below returns the actual search results to json.

It's json, so it'll be easier to work on.

http://search.tmon.co.kr/api/search/v4/deals?_=1573378672962&keyword=%EC%83%B4%ED%91%B8&useTypoCorrection=true&mainDealOnly=true&page=1&sortType=POPULAR&thr=ts


2022-09-22 18:15

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.