A simple code question...

I am Parin who is learning Python in many ways.

There is one program that is currently in use There is a part that I want to modify, but it doesn't work out well, so I'm asking you

First of all, this is the program code currently in use. As you can see from the bottom of the result, if you enter the URL you want, you can extract the PATH of the image file It's a program, currently from the results specified below

Image:External URL for https://c1.staticflickr.com/5/4005/34867024234_53b7383815_s.jpg Because of this, starting from https, it displays the entire URL.

/Image: For assets/bi-programmers-dark-52dd3a63ce83b85f0d3a11a3deefc153ce4f012b7ff7dc01d71a589ab6923.png, the previous URL is omitted because it is the same domain that I receive requests from. So using the if statement, I want to say if there is no http part, add a link, but it doesn't work

The program code currently in use.


def scraper(link):
    res = requests.get(link)

    for image in BeautifulSoup(res.text, 'html.parser', parse_only=SoupStrainer('img')):
        src = image.get('src')
        print('Image: ' + str(src))


def create_parser():
    parser = argparse.ArgumentParser()
    parser.add_argument('link', help='url of page to scrape')
    return parser


def main(args):
    parser = create_parser()
    args = parser.parse_args(args)
    url = args.link
    return scraper(url)


if __name__ == '__main__':
    main(sys.argv[1:])

Result value


Image: /assets/bi-programmers-dark-52dd3a63ce83b85f0d3a11a3deefc153ce4f012b7ff7dc01d71a589199ab6923.png
Image: /assets/bi-symbol-dark-c220f888ccd0fd2603f00300cb54bb88d0246bd239c974a637147a4f26fac3d2.png
Image: https://c1.staticflickr.com/5/4005/34867024234_53b7383815_s.jpg
Image: https://farm5.static.flickr.com/4134/4741286721_8770fe8879_s.jpg
Image: https://farm5.static.flickr.com/4093/4741286719_e4fa9ec414_s.jpg
Image: https://farm5.static.flickr.com/4081/4741286717_1e1a8ff4da_s.jpg
Image: https://farm5.static.flickr.com/4142/4741286715_ac0d603b07_s.jpg

if문

2022-09-22 16:16

1 Answers

Please refer to the following.

import requests, bs4

_url = 'http://www.kldp.org'

content = requests.get(_url).content
imgs = bs4.BeautifulSoup(content, 'html.parser', parse_only=bs4.SoupStrainer('img'))
img_urls = (img['src'] if img['src'].startswith('http') else f"{_url}{img['src']}"forimgin imgs) # Add request url if the start of image url is not http.

for url in img_urls: print(url)

2022-09-22 16:16

If you have any answers or tips

Popular Tags

python x 4647

android x 1593

java x 1494

javascript x 1427

c x 927

c++ x 878

ruby-on-rails x 696

php x 692

python3 x 685

html x 656