Understanding the Separation of Browser Display and Headerless Mode Behavior

Asked 2 years ago, Updated 2 years ago, 149 views

Background:
I'm a beginner at selenium
I'm using selenium to get information on a site.
Even though the information was retrieved successfully when the browser was displayed, the following error occurred when I ran it in headerless mode, and information retrieval was not possible

Questions included:
How to isolate behavior when the selenium browser is displayed and when the behavior is different during headerless operation
How is everyone doing? (We can only think of taking screenshots in the event of an error, so it is difficult to separate them.)

Environment:
Python 3.6
Selenium 3.141.0
Chrome Driver 74.0.3729.108

Supplemental:
I checked the screenshot of the error and found that
when running headerless. When I tried to access the product page, I was redirected to the login page.
Therefore, when I redirected, I tried to add cookies for the first successful login.
There were cases where information could be obtained successfully or not. (As a beginner, this method may be wrong, but please let me know if there are any errors.)

Partial code excerpt:

for item_link in item_links:
                try:
                    print('open{0}'.format(item_link))
                    self.browser.get(item_link)
                    if self.browser.current_url!=item_link:
                        self.browser.get(item_link)

                    WebDriverWait (self.browser, 10).until(
                        EC.presence_of_element_located(By.ID, 'container_img')))
                    store_data.append(_get_product_info(item_link))
                except:
                    # Add cookies when redirecting to login page
                    print('[ERROR02]exec retry')
                    self.browser=cookie.addCookie(self.browser)

                    print('open{0}'.format(item_link))
                    self.browser.get(item_link)
                    print(self.browser.title)

                    WebDriverWait (self.browser, 10).until(
                        EC.presence_of_element_located(By.ID, 'container_img')))
                    store_data.append(_get_product_info(item_link))
                print(store_data)

Error Description:

[ERROR02] exec retry
Traceback (most recent call last):
  File "/03_hira/src/pages/StyleIsNow.py", line 197, in read_category_page
    self.browser.get(item_link)
  File"/Users/ipap/.pyenv/versions/3.6.5/lib/python 3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 333, inget
    self.execute (Command.GET, {'url':url})
  File"/Users/ipap/.pyenv/versions/3.6.5/lib/python 3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 321, execute
    self.error_handler.check_response(response)
  File"/Users/ipap/.pyenv/versions/3.6.5/lib/python 3.6/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exception.TimeoutException:Message:timeout
  (Session info: headerless chrome=74.0.3729.108)
  (Driver info: chromedriver=74.0.3729.6 (255758eccf3d244491b8a1317aa76e1ce10d57e9-refs/branch-heads/3729@{#29}, platform=Mac OS X 10.14.4 x86_64)


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/03_hira/src/pages/StyleIsNow.py", line 239, in<module>
    style.read_category_page(sex_category, product_category)
  File"/03_hira/src/pages/StyleIsNow.py", line207, in read_category_page
    self.browser=cookie.addCookie(self.browser)
  File"/03_hira/src/utils/Cookie.py", line32, in addCookie
    self.browser.add_cookie (cookie)
  File"/Users/ipap/.pyenv/versions/3.6.5/lib/python 3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 894, add_cookie
    self.execute(Command.ADD_COOKIE, {'cookie':cookie_dict})
  File"/Users/ipap/.pyenv/versions/3.6.5/lib/python 3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 321, execute
    self.error_handler.check_response(response)
  File"/Users/ipap/.pyenv/versions/3.6.5/lib/python 3.6/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exception.TimeoutException:Message:timeout
  (Session info: headerless chrome=74.0.3729.108)
  (Driver info: chromedriver=74.0.3729.6 (255758eccf3d244491b8a1317aa76e1ce10d57e9-refs/branch-heads/3729@{#29}, platform=Mac OS X 10.14.4 x86_64)


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/03_hira/src/pages/StyleIsNow.py", line 242, in<module>
    browser.quit()
  File"/Users/ipap/.pyenv/versions/3.6.5/lib/python 3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 1055, in save_screenshot
    return self.get_screenshot_as_file(filename)
  File"/Users/ipap/.pyenv/versions/3.6.5/lib/python 3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 1032, inget_screenshot_as_file
    png = self.get_screenshot_as_png()
  File"/Users/ipap/.pyenv/versions/3.6.5/lib/python 3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 1064, inget_screenshot_as_png
    return base64.b64decode(self.get_screenshot_as_base64().encode('ascii')))
  File"/Users/ipap/.pyenv/versions/3.6.5/lib/python 3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 1074, inget_screenshot_as_base64
    return self.execute (Command.SCREENSHOT) ['value']
  File"/Users/ipap/.pyenv/versions/3.6.5/lib/python 3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 321, execute
    self.error_handler.check_response(response)
  File"/Users/ipap/.pyenv/versions/3.6.5/lib/python 3.6/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exception.TimeoutException:Message:timeout
  (Session info: headerless chrome=74.0.3729.108)
  (Driver info: chromedriver=74.0.3729.6 (255758eccf3d244491b8a1317aa76e1ce10d57e9-refs/branch-heads/3729@{#29}, platform=Mac OS X 10.14.4 x86_64)

python3 selenium web-scraping selenium-webdriver chromedriver

2022-09-30 15:36

1 Answers

When Headless Chrome is stuck or Tried running the headless chrome in Python might be helpful?

Below is an option to adjust the window size from the former link

Smaller window sizes can cause unintended results such as HTML corruption and overlapping DOM elements.

 options=Selenium::WebDriver::Chrome::Options.new({
  args: ['headless', 'start-maximized', 'window-size=1920,1080'],
})

This post was edited based on @kunif's Comment and posted as Community Wiki.This post was edited based on @kunif's Comment and posted as Community Wiki.


2022-09-30 15:36

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.