The images I've crawled so far are as follows, so I saved the image after bringing the source when crawling.
html code
<img src="http://image.yes24.com/goods/89987423/800x0" alt="12 constellation man" border="0">
Code used for crawling
driver = webdriver.Chrome('C:\chromedriver\chromedriver.exe')
html = driver.page_source
soup = BeautifulSoup(html, 'html.pareser')
img = soup.find('div', {'class': 'img_Bdr')
img = img.find('img')['src']
img_name = img.find('img')['alt']
urllib.request.urlretrieve(img_url, "dir/" + str(img_name.strip().replace("/", ",").replace('"', "'").replace(":", "-").replace(">", ")").replace("<", "(").replace("?", "")) + '.jpg')
However, the site where you want to crawl has the following image source.
html code
background-image: url('https://d3mcojo3jv0dbr.cloudfront.net/2020/09/26/15/23/d64415d1cb8cd5ec291298591e9e97af.jpeg?w=288&h=384&q=65');
How can I import and save images?
beautifulsoup selenium
import requests
from bs4 import BeautifulSoup as bs
from parse import * #pip install parse
def filesave(url):
try:
urlsplit = url.split('/')[-1]
urlsplit = urlsplit.split('.')[0] # :D
name = 'C:/Users/User/hi/'+urlsplit
bn = requests.get(url).content
if bn[0:3] != b'\xff\xd8\xff':
print('this file is not JPEG file format')
return 0
else:
if 'jpg' not in urlsplit:
name += '.jpg'
f = open(name,'wb')
f.write(bn)
f.close()
print(f'[!] {name} saved')
return name
except Exception as e:
print(e)
return 0
def main(url):
s = bs(requests.get(url).text, 'html.parser')
img = s.find('div', {'class':'article-img'})
result = parse("background-image: url('{}');", img['style'])[0] # :D
filesave(result)
if __name__ == "__main__":
main('https://fhjyang543.postype.com/series/457430/%EB%82%B4%EA%B0%80-%EC%82%AC%EB%9E%91%ED%95%9C-%EC%8B%A0%EC%97%90%EA%B2%8C')
You should have told me that you changed the post.
I modified the previously asked content a little bit. Please see and refer to it. (The part with the annotation is deformed...)
Also, if you don't know the contents of HTml related to crawling, there is a limit to helping you.
Next time... Please leave the address of the relevant site when you ask about the url in the robots.txt category.
© 2024 OneMinuteCode. All rights reserved.