About scraping with selenium in docker environment

Asked 1 years ago, Updated 1 years ago, 60 views

I am currently studying using rails in a docker environment.
I play with selenium scraping.
It worked fine locally, but as soon as I docked, I got an error.

What do you want to do
Instead of headless (with the screen hidden?), I want to display the page to be scraped on the screen so that I can see the movement.

Reason: After opening the page to be scraped, enter the password or reCAPTCHA manually, and then start processing automatically, so you can't start processing without the screen...

Source for local environment
After you enter the login information and reCAPTCHA manually by displaying the target page, you can retrieve the data automatically.↓

require'selenium-webdriver'
    require 'webdrivers'
    driver=Selenium::WebDriver.for:chrome

Source in docker environment (error ver)

require'selenium-webdriver'
    require 'webdrivers'
    driver=Selenium::WebDriver.for:chrome#←Error points

Error Contents

Selenium::WebDriver::Error: UnknownError (unknown error: Chrome failed to start:exited abnormally
rails_1 | (unknown error: DevToolsActivePort file doesn't exist)
rails_1 | (The process started from Chrome location /usr/bin/Google-chrome is no longer running, so ChromeDriver is accumulating that Chrome has crashed.):

Source in docker environment (error-free ver)

require'selenium-webdriver'
    require 'webdrivers'
    options=Selenium::WebDriver::Chrome::Options.new
    options.add_argument('--headless')
    options.add_argument('--no-sandbox')
    driver=Selenium::WebDriver.for:chrome, options:options

"When the ""headless option"" is present in the ""source in docker environment (ver with no error)"" above, there is no error, but due to the headless situation, the screen does not appear and nothing can be done..."
I am wondering if headless options are necessary in a docker environment, but I would like to be able to display them on the screen somehow.

Question details

"In the first place, can't you do """" in a docker environment where you want to display the page you want to scribble instead of hiding the headless screen?"How to do it if you can.Or is it a completely different matter?If you know this, please let me know.
If there is a way to automatically break through reCAPTCHA, headless is fine...

The host running docker has macOSCatalina version 10.15, and the host running docker is Linux.

* This is my first time to ask a question here, so I am sorry if there is anything that I cannot do.I am a beginner, and I would like to know if there are any mistakes in how to ask questions or if there is not enough information.Thank you for your cooperation.

ruby-on-rails ruby docker web-scraping selenium

2022-09-30 19:47

1 Answers

I don't know if this answer will help because I didn't try it on Selenium, but in order to bring the Google Chrome GUI, which was launched in a container made from images like ordinary Ubuntu, to the host environment, the X Window System must be connected to the host environment.I understand that there is no error in the case of headless, so it may have crashed around here.The This Dockerfile launch option is helpful as it is managed by https://github.com/jessfraz/dockerfiles.

Also, Chrome in the selenium/node-chrome-debug image managed by https://github.com/SeleniumHQ/docker-selenium can be connected using VNC, so it might be a good idea to create an image by referring to it.


2022-09-30 19:47

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.