Background:
I'm a beginner at selenium
I'm using selenium to get information on a site.
Even though the information was retrieved successfully when the browser was displayed, the following error occurred when I ran it in headerless mode, and information retrieval was not possible
Questions included:
How to isolate behavior when the selenium browser is displayed and when the behavior is different during headerless operation
How is everyone doing? (We can only think of taking screenshots in the event of an error, so it is difficult to separate them.)
Environment:
Python 3.6
Selenium 3.141.0
Chrome Driver 74.0.3729.108
Supplemental:
I checked the screenshot of the error and found that
when running headerless.
When I tried to access the product page, I was redirected to the login page.
Therefore, when I redirected, I tried to add cookies for the first successful login.
There were cases where information could be obtained successfully or not. (As a beginner, this method may be wrong, but please let me know if there are any errors.)
Partial code excerpt:
for item_link in item_links:
try:
print('open{0}'.format(item_link))
self.browser.get(item_link)
if self.browser.current_url!=item_link:
self.browser.get(item_link)
WebDriverWait (self.browser, 10).until(
EC.presence_of_element_located(By.ID, 'container_img')))
store_data.append(_get_product_info(item_link))
except:
# Add cookies when redirecting to login page
print('[ERROR02]exec retry')
self.browser=cookie.addCookie(self.browser)
print('open{0}'.format(item_link))
self.browser.get(item_link)
print(self.browser.title)
WebDriverWait (self.browser, 10).until(
EC.presence_of_element_located(By.ID, 'container_img')))
store_data.append(_get_product_info(item_link))
print(store_data)
Error Description:
[ERROR02] exec retry
Traceback (most recent call last):
File "/03_hira/src/pages/StyleIsNow.py", line 197, in read_category_page
self.browser.get(item_link)
File"/Users/ipap/.pyenv/versions/3.6.5/lib/python 3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 333, inget
self.execute (Command.GET, {'url':url})
File"/Users/ipap/.pyenv/versions/3.6.5/lib/python 3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 321, execute
self.error_handler.check_response(response)
File"/Users/ipap/.pyenv/versions/3.6.5/lib/python 3.6/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exception.TimeoutException:Message:timeout
(Session info: headerless chrome=74.0.3729.108)
(Driver info: chromedriver=74.0.3729.6 (255758eccf3d244491b8a1317aa76e1ce10d57e9-refs/branch-heads/3729@{#29}, platform=Mac OS X 10.14.4 x86_64)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/03_hira/src/pages/StyleIsNow.py", line 239, in<module>
style.read_category_page(sex_category, product_category)
File"/03_hira/src/pages/StyleIsNow.py", line207, in read_category_page
self.browser=cookie.addCookie(self.browser)
File"/03_hira/src/utils/Cookie.py", line32, in addCookie
self.browser.add_cookie (cookie)
File"/Users/ipap/.pyenv/versions/3.6.5/lib/python 3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 894, add_cookie
self.execute(Command.ADD_COOKIE, {'cookie':cookie_dict})
File"/Users/ipap/.pyenv/versions/3.6.5/lib/python 3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 321, execute
self.error_handler.check_response(response)
File"/Users/ipap/.pyenv/versions/3.6.5/lib/python 3.6/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exception.TimeoutException:Message:timeout
(Session info: headerless chrome=74.0.3729.108)
(Driver info: chromedriver=74.0.3729.6 (255758eccf3d244491b8a1317aa76e1ce10d57e9-refs/branch-heads/3729@{#29}, platform=Mac OS X 10.14.4 x86_64)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/03_hira/src/pages/StyleIsNow.py", line 242, in<module>
browser.quit()
File"/Users/ipap/.pyenv/versions/3.6.5/lib/python 3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 1055, in save_screenshot
return self.get_screenshot_as_file(filename)
File"/Users/ipap/.pyenv/versions/3.6.5/lib/python 3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 1032, inget_screenshot_as_file
png = self.get_screenshot_as_png()
File"/Users/ipap/.pyenv/versions/3.6.5/lib/python 3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 1064, inget_screenshot_as_png
return base64.b64decode(self.get_screenshot_as_base64().encode('ascii')))
File"/Users/ipap/.pyenv/versions/3.6.5/lib/python 3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 1074, inget_screenshot_as_base64
return self.execute (Command.SCREENSHOT) ['value']
File"/Users/ipap/.pyenv/versions/3.6.5/lib/python 3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 321, execute
self.error_handler.check_response(response)
File"/Users/ipap/.pyenv/versions/3.6.5/lib/python 3.6/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exception.TimeoutException:Message:timeout
(Session info: headerless chrome=74.0.3729.108)
(Driver info: chromedriver=74.0.3729.6 (255758eccf3d244491b8a1317aa76e1ce10d57e9-refs/branch-heads/3729@{#29}, platform=Mac OS X 10.14.4 x86_64)
When Headless Chrome is stuck or Tried running the headless chrome in Python might be helpful?
Below is an option to adjust the window size from the former link
Smaller window sizes can cause unintended results such as HTML corruption and overlapping DOM elements.
options=Selenium::WebDriver::Chrome::Options.new({
args: ['headless', 'start-maximized', 'window-size=1920,1080'],
})
This post was edited based on @kunif's Comment and posted as Community Wiki.This post was edited based on @kunif's Comment and posted as Community Wiki.
© 2024 OneMinuteCode. All rights reserved.