I'm sorry I'm a Python Python.I'd like to get information from a website at once, but is there any way to get all of them at once because there are various menus on the web page?
I think it's the basics of scraping, but I'd appreciate it if you could help me.
The web pages are as follows:
I have tried the following as a template, but I do not get any errors, but I cannot get them.
#For saving
driver_path=r'C:\Anaconda3\chromedriver.exe' #My Chromedriver Location
# Location of folder you want to load
URL='https://nintei.nurse.or.jp/certification/General/(X(1)S(efl0y555pect3x45oxjzfw3x))/GCPP01LS/GCPP01LS.aspx'
# Location of folder you want to store
send_path=r'C:\Users\akira\Documents\Python\Company'
from selenium import webdriver
import time
import bs4
import re
importos
import time
import shutil
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
start = time.time()
driver=webdriver.Chrome(driver_path)
driver.get (URL)
time.sleep(3)
soup=bs4.BeautifulSoup(driver.page_source, 'html5lib')
base='https://nintei.nurse.or.jp/certification/General/'
soup_file1 = soup.find_all('a')
href_list = [ ]
file_num = 1
sum_file=1
cc = 0
for sinsoup_file1:
ifs.string == 'Search':
path=base+s.get('href')
href_list.append(path)
print(path)
driver.get(path)
WebDriverWait(driver,300).until(EC.element_to_be_clickable(By.XPATH, '//*[@id="ctl00_plhContent_btnSearchMain"]')))
driver.find_element_by_xpath('//*[@id="ctl00_plhContent_btnSearchMain" ]').click()
while sum_file==file_num:
sum_file=len(os.listdir(r'C:\Users\akira\Downloads')))
else:
print("Current number of download files_{}".format(sum_file-1))
file_num+=1
cc+=1
# Allow some time for temporary files to get in the way
time.sleep(60)
# Moving Files
dw_path=r'C:\Users\akira\Documents\Python\Company'
dw_list=os.listdir(dw_path)
dw_xlsx = [f for find dw_list ]
for dwindw_xlsx:
shutil.move(r'C:\Users\akira\Documents\Python\Company')
The page contains several hidden parameters (for example, __EVENTVALIDATION
).Obtain the values of these parameters on the first access and submit the form data.
Below is the python script that does this, but the response (HTML file) contains only the first 50 search results.To get all the search results, you need to add a link to each page to retrieve the HTML file.
import urlib
from bs4 import BeautifulSoup
# first access —get hidden parameters
url=r'https://nintei.nurse.or.jp/certification/General/(X(1)S(efl0y555pect3x45oxjzfw3x))/General/GCPP01LS/GCPP01LS.aspx'
html=urllib.request.urlopen(url)
soup = BeautifulSoup(html, 'lxml')
# generate form data
count=int(soup.select('#__VIEWSTATEFIELDCOUNT')[0]['value'])
form_data = {
'__VIEWSTATEFIELDCOUNT': count,
'__VIEWSTATE': group.select('#__VIEWSTATE') [0] ['value'],
'__EVENTVALIDATION': group.select('#__EVENTVALIDATION') [0] ['value']
}
for i in range (1, count):
form_data [f'___VIEWSTATE{i}'] = group.select(f'#___VIEWSTATE{i}') [0] ['value']
form_data ['ctl00$plhContent$btnSearchMain'] = 'Search'
form_data ['ctl00$plhContent$drpField'] = -1
form_data ['ctl00$plhContent$drpNameOwnerWorking'] = -1
form_data ['ctl00$plhContent$drpWorkPrefecture'] = -1
form_data ['ctl00$plhContent$drpWorkType'] = -1
form_data ['ctl00$plhContent$radlstCert'] = 1
# second access:get search result
form_data = urllib.parse.urlencode(form_data).encode()
html=urllib.request.urlopen(url,form_data).read().decode()
print(html)
There are many different ways of holding/showing information on a website, and there is no such thing as a template that can be applied to different sites, regardless of whether it is created by the same company for the same purpose.
Forget the idea that some template exists (applicable) and start by looking at the content/configuration of the target site.
a
tags, and you click the "Find" button in it, but it does not match the content of this page.Note:
Full data published officially: (available on Excel sheet)
Information Processing Security Advisor Search Service
Examples that can be retrieved by behind-the-scenes techniques:
(The comments show behind-the-scenes techniques)
Python scraping table cannot be retrieved
I am also responding to the reference information for obtaining it by scraping.
Slape with selenium (next page available)
Here's how you think.
base='https://nintei.nurse.or.jp/certification/General/'
and later reviewdriver.find_element_by_xpath('//*[@id="ctl00_plhContent_btnSearchMain"]Get it with ').click()
'//* [@id="ctl00_plhContent_dlvMain"]
(use it if there's another way you think it's easy to use)try:exception:
© 2024 OneMinuteCode. All rights reserved.