Scratches only the specified layer into a range

Asked 1 years ago, Updated 1 years ago, 132 views

I would like to specify the scope to be scraped.
All I have in mind is a vague vision of cutting out lxml partially using find and so on, and then making soup again.

<divid="foo">
    <a href="*1">
        <img src="*1.jpg" class="bar"/>
    </a> 
    <a href="*2">
        <img src="*2.jpg" class="bar"/>
    </a> 
    <a href="*3">
        <img src="*3.jpg" class="bar"/>
    </a> 
</div>

<img src="*4.jpg" class="bar"/>

If such a layer exists in a part of HTML, I would like to obtain only the URL of the image (*1.jpg, *2.jpg, *3.jpg) included in the layer.
However, there are also images outside the layer where class is equivalent to bar.

What kind of solutions would it be possible for me to come up with?

The environment is
Python 3.6.2
Scraping requires
I would like to use BeautifulSoup4 and refrain from introducing Selenium.

python html web-scraping beautifulsoup

2022-09-30 19:31

1 Answers

You can retrieve only a small element of id called foo by using a child selector, such as #foo>.bar.
child binders|MDN


2022-09-30 19:31

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.