Hello, I'm a beginner who is practicing automatic login and crawling using Python.
I tried to crawl the information I wanted by logging in to a specific site, but it was blocked.
I logged in automatically through Selenium.
The problems are as follows:
The information I want is 20121206-504
shown at the bottomIf you look at the source 20121206-504 in the second image, it looks like the first image above.
What kind of sauce should I use to make it a beautiful soup? The implementation code is as follows:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from time import sleep
from selenium.webdriver.support.ui import Select
from bs4 import BeautifulSoup
driver = webdriver.Chrome() driver.get("https://address")
sleep(1)
driver.find_element_by_name('regi_no').send_keys ('ID')
driver.find_element_by_name('pass').send_keys ('password')
driver.find_element_by_xpath('/html/body/div/form/center/input[1]').click()
sleep(3)
driver.find_element_by_xpath('//*[@id="myModal01"]/div/div/div[3]/button').click()
sleep(2)
driver.find_element_by_xpath('//*[@id="myModal02"]/div/div/div[3]/button').click()
sleep(2)
driver.find_element_by_xpath('//*[@id="myModal03"]/div/div/div[3]/button').click()
sleep(2)
driver.find_element_by_xpath('//*[@id="myModal04"]/div/div/div[3]/button').click()
sleep(2)
driver.find_element_by_name('grcode').click()
sleep(2)
driver.find_element_by_xpath('/html/body/table[3]/tbody/tr[3]/td[1]/p/font/span/select/option[2]').click()
If you have visited the page you want through a web driver, get the source of the page and work with Beautiful Soup.
html = driver.page_source
soup = BeautifulSoup(html, 'html.parser)
© 2024 OneMinuteCode. All rights reserved.