Scratches only the specified layer into a range

I would like to specify the scope to be scraped.
All I have in mind is a vague vision of cutting out lxml partially using find and so on, and then making soup again.

<divid="foo">
    <a href="*1">
        <img src="*1.jpg" class="bar"/>
    </a> 
    <a href="*2">
        <img src="*2.jpg" class="bar"/>
    </a> 
    <a href="*3">
        <img src="*3.jpg" class="bar"/>
    </a> 
</div>

<img src="*4.jpg" class="bar"/>

If such a layer exists in a part of HTML, I would like to obtain only the URL of the image (*1.jpg, *2.jpg, *3.jpg) included in the layer.
However, there are also images outside the layer where class is equivalent to bar.

What kind of solutions would it be possible for me to come up with?

The environment is
Python 3.6.2
Scraping requires
I would like to use BeautifulSoup4 and refrain from introducing Selenium.

python html web-scraping beautifulsoup

2022-09-30 19:31

1 Answers

You can retrieve only a small element of id called foo by using a child selector, such as #foo>.bar.
child binders|MDN

2022-09-30 19:31

If you have any answers or tips

Popular Tags

python x 4647

android x 1593

java x 1494

javascript x 1427

c x 927

c++ x 878

ruby-on-rails x 696

php x 692

python3 x 685

html x 656