I am Parin who is learning Python in many ways.
There is one program that is currently in use There is a part that I want to modify, but it doesn't work out well, so I'm asking you
First of all, this is the program code currently in use. As you can see from the bottom of the result, if you enter the URL you want, you can extract the PATH of the image file It's a program, currently from the results specified below
Image:External URL for https://c1.staticflickr.com/5/4005/34867024234_53b7383815_s.jpg Because of this, starting from https, it displays the entire URL.
/Image: For assets/bi-programmers-dark-52dd3a63ce83b85f0d3a11a3deefc153ce4f012b7ff7dc01d71a589ab6923.png, the previous URL is omitted because it is the same domain that I receive requests from. So using the if statement, I want to say if there is no http part, add a link, but it doesn't work
The program code currently in use.
def scraper(link):
res = requests.get(link)
for image in BeautifulSoup(res.text, 'html.parser', parse_only=SoupStrainer('img')):
src = image.get('src')
print('Image: ' + str(src))
def create_parser():
parser = argparse.ArgumentParser()
parser.add_argument('link', help='url of page to scrape')
return parser
def main(args):
parser = create_parser()
args = parser.parse_args(args)
url = args.link
return scraper(url)
if __name__ == '__main__':
main(sys.argv[1:])
Result value
Image: /assets/bi-programmers-dark-52dd3a63ce83b85f0d3a11a3deefc153ce4f012b7ff7dc01d71a589199ab6923.png
Image: /assets/bi-symbol-dark-c220f888ccd0fd2603f00300cb54bb88d0246bd239c974a637147a4f26fac3d2.png
Image: https://c1.staticflickr.com/5/4005/34867024234_53b7383815_s.jpg
Image: https://farm5.static.flickr.com/4134/4741286721_8770fe8879_s.jpg
Image: https://farm5.static.flickr.com/4093/4741286719_e4fa9ec414_s.jpg
Image: https://farm5.static.flickr.com/4081/4741286717_1e1a8ff4da_s.jpg
Image: https://farm5.static.flickr.com/4142/4741286715_ac0d603b07_s.jpg
Please refer to the following.
import requests, bs4
_url = 'http://www.kldp.org'
content = requests.get(_url).content
imgs = bs4.BeautifulSoup(content, 'html.parser', parse_only=bs4.SoupStrainer('img'))
img_urls = (img['src'] if img['src'].startswith('http') else f"{_url}{img['src']}"forimgin imgs) # Add request url if the start of image url is not http.
for url in img_urls: print(url)
© 2024 OneMinuteCode. All rights reserved.