I would like to specify the scope to be scraped.
All I have in mind is a vague vision of cutting out lxml partially using find and so on, and then making soup again.
<divid="foo">
<a href="*1">
<img src="*1.jpg" class="bar"/>
</a>
<a href="*2">
<img src="*2.jpg" class="bar"/>
</a>
<a href="*3">
<img src="*3.jpg" class="bar"/>
</a>
</div>
<img src="*4.jpg" class="bar"/>
If such a layer exists in a part of HTML, I would like to obtain only the URL of the image (*1.jpg, *2.jpg, *3.jpg) included in the layer.
However, there are also images outside the layer where class is equivalent to bar.
What kind of solutions would it be possible for me to come up with?
The environment is
Python 3.6.2
Scraping requires
I would like to use BeautifulSoup4 and refrain from introducing Selenium.
You can retrieve only a small element of id called foo
by using a child selector, such as #foo>.bar
.
child binders|MDN
© 2024 OneMinuteCode. All rights reserved.