I have a question for Python web crawling.

Asked 2 years ago, Updated 2 years ago, 117 views

import requests

response = requests.get('http://?????/')

html = response.text

from bs4 import BeautifulSoup soup = BeautifulSoup(html, 'html.parser') for tag in soup.select('tbody'): print(tag.text) I deleted the address because it's for internal use.

I would like to print only two types of tags, strong and lmis, out of all the tags in tbody. When each output is completely strong and then lmis is completely output. I want to make it possible to print it out in this way, and I want to make it only if lmis is a number format and it is 1000 or morePlease help me

From that source, IFIA10 1,107 I want to print out two things, and the number 1107 represents time. I only want to print it out if it's over 1000 hours.

python web-crawling

2022-09-21 19:41

1 Answers

I didn't check if it was right, so just refer to it.

strong_tags = soup.tbody.findAll('strong')
lmis_tags = soup.tbody.findAll('lims')

for i in lmis_tags:
    if int(i.get_text().replace(',','') < 1000:lmis_tags.remove(i)

I think it can be used as a filter function instead of the for statement above

lmis_tags = list(filter(lambda i: int(i.get_text().replace(',','')) >= 1000,lmis_tags)) 


2022-09-21 19:41

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.