I have a question about Selenium scraping.
I am currently writing a scraping script for the ↓ page.
https://www.gakujo.ne.jp/2022/company/baseinfo/22242/
environment
MacBook Pro
Google Chrome
Jupiter Notebook
Selenium
symptoms
A part of the headline ("profile") is automatically translated into English.
↓Parts acquired this time
script
driver.get('https://www.gakujo.ne.jp/2022/company/baseinfo/22242/')
company_profile=driver.find_element_by_css_selector('#ctl00_ContentPlaceHolder1_MyBaseInfoCtl_pnlProfile')
company_profile=company_profile.text
print(company_profile)
↓Results
profile
[Medical × People × Technology = 】] What is the attraction of companies that continue to provide new value to the medical industry?
HR talks "Pay attention here!"
"Leading company" challenges Japan's medical community
Operate [Private Medical Bureau] to fully support doctors and medical institutions
New proposals based on collective knowledge of medical xVR, medical x drone, medical x food, etc. (all of which are acquired in Japanese below)
見出しThe headline Profile and HR talks "Pay attention here!".
Supplementary
Depending on the page, the symptoms may vary depending on the page, such as the case where all text is acquired in English or the case where all text is acquired in Japanese.I have a question because I can't find a good solution even if I google it. (Do you think it has anything to do with Google translation or DeepL plug-in?)
If you have any knowledge, please give me some advice.
python python3 selenium selenium-webdriver
I solved this problem by setting the language setting for the headless Chrome to English → Japanese.
reference:
[Selenium] Set the language setting for the headless Chrome to English→Japanese
Add the following options
options.add_argument('--lang=ja-JP')
Thank you.
© 2024 OneMinuteCode. All rights reserved.