Python Beautiful Soup Web Page Option Information Crawling

Asked 1 years ago, Updated 1 years ago, 130 views

HTML information.

<div class=:selectScroll">
    <div class ="wrap">
    <div class ="box" id ="option_all_view_area">
        <a class = "optionLine" name ="opt_select" optprdno="7042309496">
            <div class ="number">1</div>
            <div class ="option" name="opt_veiw_all_name_area">/225
        <a class = "optionLine" name ="opt_select" optprdno="7042309496">
            <div class ="number">2</div>
            <div class ="option" name="opt_veiw_all_name_area">/230
        <a class = "optionLine" name ="opt_select" optprdno="7042309496">
            <div class ="number">3</div>
            <div class ="option" name="opt_veiw_all_name_area">/235
        <a class = "optionLine" name ="opt_select" optprdno="7042309496">
            <div class ="number">4</div>
            <div class ="option" name="opt_veiw_all_name_area">/240

Here, I would like to get a shoe size (text information) with class = "option" (225, 230, 235, 2.4 million)

option_name=soup.find('a', class_='optionLine').find('div', class_='option').text

Use to get only the number 1 (top) value of the option...

option_name=soup.find_all('a', class_='optionLine').find_all('div', class_='option').text

An error appears when I do...

I think I should get it separately through the option number, but I don't know what to do. ㅠ<

Thank you.

python-3.x python beautifulsoup

2022-09-21 11:31

1 Answers

HTML is a mess. Inside the inline tag <a>, <div> is also not closed...

So, assuming that the markup below is the original, I'll answer:

<div class="selectScroll">
    <div class="wrap">
        <div class="box" id="option_all_view_area">
            <a class= "optionLine" name="opt_select" optprdno="7042309496">
                <div class="number">1</div>
                <div class="option" name="opt_veiw_all_name_area">225</div>
            </a>
            <a class= "optionLine" name="opt_select" optprdno="7042309496">
                <div class="number">2</div>
                <div class="option" name="opt_veiw_all_name_area">230</div>
            </a>
            <a class= "optionLine" name="opt_select" optprdno="7042309496">
                <div class="number">3</div>
                <div class="option" name="opt_veiw_all_name_area">235</div>
            </a>
            <a class= "optionLine" name="opt_select" optprdno="7042309496">
                <div class="number">4</div>
                <div class="option" name="opt_veiw_all_name_area">240</div>
            </a>
        </div>
    </div>
</div>

Fix it like this.

soup = BeautifulSoup(page_source, "html.parser")
optionLines = soup.find_all('a', class_='optionLine')

resultArr = []
for ele in optionLines:
    resultArr.append(ele.find('div', class_='option').text)

print(resultArr) # ['225', '230', '235', '240']

The find() method looks for only one tag that corresponds to the given condition. Normally, if you look from the top, you get 225 only.

The result of the find_all() method is of type bs4.element.ResultSet. This is a bundle of bs4.element.Tag. The ResultSet type does not have the find_all() method. That's why there's an error.


2022-09-21 11:31

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.