This is a question related to page parsing during Python web crawling.

Asked 2 years ago, Updated 2 years ago, 95 views

import requests

response = requests.get('http://?????/')

html = response.text

from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')
for tag in soup.select('tbody'):
        print(tag.text)

I deleted the address because it's for internal use.

I want to print only two types of tags in tbody and lmis among all tags in tbody. When each is output, strong is completely output, and then lmis is completely output. strong/lmis I want to make it printable like this, and lmis is a numeric format I want to make it only derived if it is 1000 or higherThe theyn the Help me

python beautifulsoup

2022-09-22 18:27

3 Answers

It's not an in-house network, but a site created to monitor the equipment we use.

I think I didn't explain well because I'm a Python beginner.

IFIA10 1,107 I want to derive these two. lmis is a time representation, so we want to print out only things that are more than 1000 hours.


2022-09-22 18:27

print(tag.tr.td.font.strong.text)
print(tag.tr.td.lmis.text)

If you want to erase the comma and change it to int,

a = tag.tr.td.lmis.text
print(int(a.text.replace(',' , '')))

If you want to print more than 1000,

a = tag.tr.td.lmis.text
a = int(a.replace(',' , ''))
if a>=1000:
    print(a)


2022-09-22 18:27

I don't think I can catch td. Is there a reason why I used select?

print(soup.find('strong').text)
print(soup.find('lmis').text)

For tag~~ Do you want to try this after erasing the phrase?


2022-09-22 18:27

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.