I have a question about how to crawl an image using Beautiful Soup

The images I've crawled so far are as follows, so I saved the image after bringing the source when crawling.

html code <img src="http://image.yes24.com/goods/89987423/800x0" alt="12 constellation man" border="0">

Code used for crawling

driver = webdriver.Chrome('C:\chromedriver\chromedriver.exe')

html = driver.page_source
soup = BeautifulSoup(html, 'html.pareser')

img = soup.find('div', {'class': 'img_Bdr')
img = img.find('img')['src']
img_name = img.find('img')['alt']
urllib.request.urlretrieve(img_url, "dir/" + str(img_name.strip().replace("/", ",").replace('"', "'").replace(":", "-").replace(">",  ")").replace("<", "(").replace("?", "")) + '.jpg')

However, the site where you want to crawl has the following image source.

html code background-image: url('https://d3mcojo3jv0dbr.cloudfront.net/2020/09/26/15/23/d64415d1cb8cd5ec291298591e9e97af.jpeg?w=288&h=384&q=65'); Site

How can I import and save images?

beautifulsoup selenium

2022-09-20 19:54

1 Answers

import requests
from bs4 import BeautifulSoup as bs
from parse import * #pip install parse

def filesave(url):
    try:
        urlsplit = url.split('/')[-1]
        urlsplit = urlsplit.split('.')[0] # :D
        name = 'C:/Users/User/hi/'+urlsplit
        bn = requests.get(url).content
        if bn[0:3] != b'\xff\xd8\xff':
            print('this file is not JPEG file format')
            return 0
        else:
            if 'jpg' not in urlsplit:
                name += '.jpg'
        f = open(name,'wb')
        f.write(bn)
        f.close()
        print(f'[!] {name} saved')
        return name
    except Exception as e:
        print(e)
        return 0

def main(url):
    s = bs(requests.get(url).text, 'html.parser')
    img = s.find('div', {'class':'article-img'})
    result = parse("background-image: url('{}');", img['style'])[0] # :D
    filesave(result)

if __name__ == "__main__":
    main('https://fhjyang543.postype.com/series/457430/%EB%82%B4%EA%B0%80-%EC%82%AC%EB%9E%91%ED%95%9C-%EC%8B%A0%EC%97%90%EA%B2%8C')

You should have told me that you changed the post.

I modified the previously asked content a little bit. Please see and refer to it. (The part with the annotation is deformed...)

Also, if you don't know the contents of HTml related to crawling, there is a limit to helping you.

Next time... Please leave the address of the relevant site when you ask about the url in the robots.txt category.

2022-09-20 19:54

If you have any answers or tips

Popular Tags

python x 4647

android x 1593

java x 1494

javascript x 1427

c x 927

c++ x 878

ruby-on-rails x 696

php x 692

python3 x 685

html x 656