Max retries exceeded with url & [SSL: CERTIFICATE_VERIFY_FAILED] impossible to get local issuer certificate error when using selenium and BS4

Asked 2 years ago, Updated 2 years ago, 108 views

Hi, everyone qpldocs.dla.mil/search/parts.aspx?qpl=1780 We would like to extract the lists shown in the image below as text from the url above. (Since we are going to go through each list and perform additional curling, the use of selenium is essential, not just BS4 coding.)

To do this, I tried to extract the list through lxml and for statements through beautiful soup, but if I try to use lxml, I get an error as shown below.


>  Max retries exceeded with url: /search/parts.aspx?qpl=1780
  (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate 

If you print out only one list using selenium without using beautiful, the error will not appear and works well. (You can check that it works by processing 3 lines of #error annotation written in the code.)

I tried running Install Certificates.command again, but the problem didn't go away. I look forward to your valuable answer. Thank you!

If my explanation is not enough, please leave a comment so I can explain it in more detail!

Below is my code.

import requests
from bs4 import BeautifulSoup
import os
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
import time
from urllib.request import urlopen


def set_chrome_driver():
    chrome_options = webdriver.ChromeOptions()
    driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=chrome_options)
    return driver

driver = webdriver.Chrome("./chromedriver")
url = "https://qpldocs.dla.mil/search/parts.aspx?qpl=1780"
driver.get(url)
#3 lines below where errors occur
res = requests.get(url)
res.raise_for_status()
soup = BeautifulSoup(res.text,"lxml")

partlist = driver.find_element(By.ID, "Lu_gov_DG_ctl03_btnGovPartNo")
print(partlist.text)


driver.find_element(By.XPATH,"//*[@id='Lu_gov_DG_ctl03_btnGovPartNo']").click()

partauthorizationcompay = driver.find_element(By.ID, "Lu_man_DG_ctl03_lblCompany")
print(partauthorizationcompay.text)

partstatus = driver.find_element(By.ID, "Lu_man_DG_ctl03_imgCompanyStatus")
print(partstatus.text)

The following is a copy and paste of the error output as it is!

> /Users/seoyeonghun/Desktop/Project/WebCrawling_QPD/practice3.py:20: DeprecationWarning: executable_path has been deprecated, please pass in a Service object
  driver = webdriver.Chrome("./chromedriver")
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/urllib3/connectionpool.py", line 699, in urlopen
    httplib_response = self._make_request(
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/urllib3/connectionpool.py", line 382, in _make_request
    self._validate_conn(conn)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/urllib3/connectionpool.py", line 1010, in _validate_conn
    conn.connect()
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/urllib3/connection.py", line 416, in connect
    self.sock = ssl_wrap_socket(
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/urllib3/util/ssl_.py", line 449, in ssl_wrap_socket
    ssl_sock = _ssl_wrap_socket_impl(
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/urllib3/util/ssl_.py", line 493, in _ssl_wrap_socket_impl
    return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/ssl.py", line 512, in wrap_socket
    return self.sslsocket_class._create(
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/ssl.py", line 1070, in _create
    self.do_handshake()
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/ssl.py", line 1341, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:997)

> During handling of the above exception, another exception occurred:

> Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/requests/adapters.py", line 439, in send
    resp = conn.urlopen(
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/urllib3/connectionpool.py", line 755, in urlopen
    retries = retries.increment(
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/urllib3/util/retry.py", line 574, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='qpldocs.dla.mil', port=443): Max retries exceeded with url: /search/parts.aspx?qpl=1780 (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:997)')))

> During handling of the above exception, another exception occurred:

> Traceback (most recent call last):
  File "/Users/seoyeonghun/Desktop/Project/WebCrawling_QPD/practice3.py", line 23, in <module>
    res = requests.get(url)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/requests/api.py", line 75, in get
    return request('get', url, params=params, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/requests/api.py", line 61, in request
    return session.request(method=method, url=url, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/requests/sessions.py", line 542, in request
    resp = self.send(prep, **send_kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/requests/sessions.py", line 655, in send
    r = adapter.send(request, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/requests/adapters.py", line 514, in send
    raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='qpldocs.dla.mil', port=443): Max retries exceeded with url: /search/parts.aspx?qpl=1780 (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:997)'))

Thank you.

selenium beautifulsoup python

2022-09-20 11:33

1 Answers

The reason for the error is the ssl certificate error. It's not important to solve this problem.

If you look at the code now, it's sending a total of two traffic, once to the selenium and once through the requests module. I'm trying to do something unnecessary.

To analyze html tags through selenium:

html = driver.page_source
soup = BeautifulSoup(html, 'lxml')

By the way, the site seems to be run by the U.S. government, so are you trying to crawl after looking closely at the site usage policy?

If crawling causes problems, it can be difficult to deal with, so we recommend you to look into the relevant information and try it, or request the necessary data through an officially.


2022-09-20 11:33

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.