Import Post Title to Python Beautiful Soup

Asked 2 years ago, Updated 2 years ago, 123 views

Log in to Python Selenium, go to the bulletin board, and get the page source When the word "weather" or "corona" comes up in the title of the post, I'm watching and following the lecture to send the title and content of the article to a Telegram bot.

I'm looking for help because the title of the post has not been imported

#The source omitted above is the part where you log in with selenium
driver.get(url2) #BoardMove
driver.implicitly_wait(15)
html = driver.page_source
soup = bs(html, 'html.parser')
subjects = soup.select('.item-subject') #Select a post class
print(subjects)

When you run the source above, it comes out well as below

(The title of the article is "Today's weather reason", "It's so cold" and "Wuhan Corona")

# There are about 20 posts on one page, but I only brought three

<a class="item-subject" href="https://123.com/bbs/board.php?bo_table=aboard01&amp;wr_id=5452">
<span class="orangered visible-xs pull-right wr-comment">
<i class="fa fa-comment lightgray"></i>
<b>+41</b> #Comments
</span>
<span class="wr-icon wr-new"></span> #Icons announcing the new word Why Today's Weather #Title
        <b class="count or arranged hidden-xxs">+41</b> #comments
</a>, 

<a class="item-subject" href="https://123.com/bbs/board.php?bo_table=aboard01&amp;wr_id=5451">
<span class="orangered visible-xs pull-right wr-comment">
<i class="fa fa-comment lightgray"></i>
<b>+19</b>
</span>
<span class="wr-icon wr-new"></span> The weather is so cold "T"
        <b class="count orangered hidden-xs">+19</b>
</a>,

 <a class="item-subject" href="https://123.com/bbs/board.php?bo_table=aboard01&amp;wr_id=5454">
<span class="orangered visible-xs pull-right wr-comment">
<i class="fa fa-comment lightgray"></i>
<b>+23</b>
</span>
<span class="wr-icon wr-new"></span> Wuhan Corona 
        <b class="count orangered hidden-xs">+23</b>
</a>

The question is, the next part Since the class you import to select in the lecture becomes a list, you can't bring the text, so you're told to use select_one as shown below.

#The source omitted above is the part where you log in with selenium
driver.get(url2) #BoardMove
driver.implicitly_wait(15)
html = driver.page_source
soup = bs(html, 'html.parser')
subjects = soup.select('.item-subject') #Select a post class
for subject in subjects:
    print(subject.select_one('.item-subject').text)

If you change the class name in the lecture and run the above source, the following error will occur

Exception occurred. AttributeError
'NoneType' object has no attribute 'text'
  File "C:\py\status.py", line 101, in <module>
    print(subject.select_one('.item-subject').text)

If you look at the error, I think we need to put another class in this part of select_one('.item-subject') If you look at the page source, there's no class that points to the title, so I don't know how to bring it How do I get a title?

python beautifulsoup

2022-09-21 13:53

1 Answers

subjects = soup.select('.item-subject') #selectclass
for subject in subjects:
    # # print(subject.select_one('.item-subject').text)
    print(subject.text)

You shouldn't find .item-subject in .item-subject again.

But this way, all the text nodes come out :

subjects = soup.select('.item-subject') #selectclass
for subject in subjects:
    print(subject.select_one('span.wr-new').next_sibling)

You can get as close as you can, and then filter out and write unnecessary strings.


2022-09-21 13:53

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.