Hello.
When studying Python web crawling, use the findall
function and after F12 > ctrl + shift + c
I don't know what to put in the HTML structure
I have a question.
The site is https://chilgok.fowi.or.kr/facility/admin.do
.
The photo code I'm trying to put in is <img src="/images/facility/admin photo05.jpgalt="library thumbnail">
The coding is as follows
import urllib.request as req
import re
rep = req.urlopen('https://chilgok.fowi.or.kr/facility/admin.do')
data = rep.read().decode('utf8')
result = re.foundall('+.jpg',data) ######### I didn't know what to put in this part.
for link in result:
idx = link.rfind('/')
with open(link[idx+1:], "wb") as f:
pic = req.urlopen(link)
f.write( pic.read() )
Source: https://www.youtube.com/watch?v=NKE0ozQ1Esw
I never use it commercially.
python crawling html css
The site was created using jsp and uses a relative path, so you can't get it directly through the entire path like a video.
I checked and found that the location where the image was saved should be loaded after https://chilgok.fowi.or.kr, so I modified the code a little bit.
Additionally, I added ssl because I couldn't access the site because of https security when I accessed the site with urllib.
Below is the full text of the code.
import urllib.request as req
import re
import ssl
context = ssl._create_unverified_context()
url = 'https://chilgok.fowi.or.kr'
active = '/facility/admin.do'
rep = req.urlopen(url+active,context=context)
data = rep.read().decode('utf8')
result = re.findall('/images.+jpg', data)
# Regular expression "/images" text + at least one letter + "jpg" text
# The "/images/facility/admin_photo01.jpg" image is a relative path, so you need to find the root path. In this case, "https://chilgok.fowi.or.kr"
for link in result:
print(url+link)
# Root Path + Relative Path
imgUrl = url + link
idx = imgUrl.rfind('/')
fileName = imgUrl[idx+1:]
print(fileName)
# If there is an error in the middle, please check how far you receive the normal value.
with open(fileName, "wb") as f:
pic = req.urlopen(imgUrl,context=context) # Here too (https - ssl)
f.write( pic.read() )
I didn't know ctrl + shift + c, but I knew it was good.
To learn more about web crawling, see Beautiful Soup.
Thank you.
© 2024 OneMinuteCode. All rights reserved.