Questions about Python crawl

Asked 2 years ago, Updated 2 years ago, 55 views

Hello, everyone I'm just learning about Python from the beginning, and I'm asking you a question because there's a blockage while crawling the webpage information. I tried hard to googling, but I couldn't find a sharp move, so I'm asking for your help.

Questions. As I turned the code below, if the Response code is 200 and there is no URL information, the entire result will be printed as None even if there is a name or phone number. I think there's something wrong with the exception. I wonder how to modify the URL information to print it out as None.

@@ (return_nm.text, find_cat.text, find_tell.text, find_adr.text, find_url.text) If you take out the text in this part, there's so much miscellaneous information that I want to keep it.

//
import requests
from bs4 import BeautifulSoup


def crawl(url):
    headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36'}
    data = requests.get(url=url , headers=headers)
    print(data,"/", url,"/",end = ' ')
    return data.content

def parse(pageString):
    try :
        bsObj = BeautifulSoup(pageString, "html.parser")
        #Name
        find_nm = bsObj.find("strong",{"class" : "name"})
        #Category
        find_cat = bsObj.find("span", {"class" : "category"})
        #Phone number
        find_tell = bsObj.find("div", {"class": "txt"})
        #Address
        find_adr = bsObj.find("span", {"class": "addr"})
        # Homepage
        find_url = bsObj.find("a", {"class": "biz_url"})
        return find_nm.text, find_cat.text, find_tell.text, find_adr.text, find_url.text
    except  :
        pass
def printCompanyInfo(code):
    url = "https://store.naver.com/restaurants/detail?id={}".format(code)
    pageString = crawl(url)
    companyInfo = parse(pageString)
    print(companyInfo)

printCompanyInfo("33696029")
printCompanyInfo("13317484")
printCompanyInfo("32287256")
printCompanyInfo("37322689")
printCompanyInfo("36772108")
printCompanyInfo("413454114")
printCompanyInfo("31621852")
printCompanyInfo("13303181")
printCompanyInfo("37127150")
printCompanyInfo("34565498")
//

python crawling

2022-09-21 20:48

1 Answers

except Exception as e:
    print(e)
    print(find_nm, find_cat, find_tell, find_adr, find_url)

If I take it like this, there is no homepage, so there is an error with find_url and None.

hp_url = find_url.text if find_url else ''

I think you can handle it like this.


2022-09-21 20:48

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.