Scraping automatically translates some of the text into English

Asked 2 years ago, Updated 2 years ago, 459 views

I have a question about Selenium scraping.

I am currently writing a scraping script for the ↓ page.
https://www.gakujo.ne.jp/2022/company/baseinfo/22242/

environment
MacBook Pro Google Chrome Jupiter Notebook
Selenium

symptoms
A part of the headline ("profile") is automatically translated into English.
↓Parts acquired this time

Part Retrieved This Time

script

driver.get('https://www.gakujo.ne.jp/2022/company/baseinfo/22242/')
company_profile=driver.find_element_by_css_selector('#ctl00_ContentPlaceHolder1_MyBaseInfoCtl_pnlProfile')
company_profile=company_profile.text
print(company_profile)

↓Results

profile
[Medical × People × Technology = 】] What is the attraction of companies that continue to provide new value to the medical industry?
HR talks "Pay attention here!"
"Leading company" challenges Japan's medical community
Operate [Private Medical Bureau] to fully support doctors and medical institutions
New proposals based on collective knowledge of medical xVR, medical x drone, medical x food, etc. (all of which are acquired in Japanese below)

見出しThe headline Profile and HR talks "Pay attention here!".

Supplementary
Depending on the page, the symptoms may vary depending on the page, such as the case where all text is acquired in English or the case where all text is acquired in Japanese.I have a question because I can't find a good solution even if I google it. (Do you think it has anything to do with Google translation or DeepL plug-in?)

If you have any knowledge, please give me some advice.

python python3 selenium selenium-webdriver

2022-09-30 21:55

1 Answers

I solved this problem by setting the language setting for the headless Chrome to English → Japanese.

reference:

[Selenium] Set the language setting for the headless Chrome to English→Japanese

Add the following options

options.add_argument('--lang=ja-JP')

Thank you.


2022-09-30 21:55

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.