bf4 in python gets an error while scraping

Asked 2 years ago, Updated 2 years ago, 121 views

I made a crawler with Python, but sometimes I get errors when I crawl and get the title of the web page.However, I don't know which page in the crawl gets the error, so I don't know the cause.There are pages that you can crawl properly.I would appreciate it if you could give me some advice.Thank you for your cooperation.

Research Results
I thought it might be an error without a title tag, so I looked it up, but it wasn't.
I thought the title was too long, but it wasn't either.
If the title tag is empty, it will appear empty.

error

Traceback (most recent call last):
  File"/vagrant/pysearch-master/manage.py", line 15, in<module>
    US>crawl_web('https://applech2.com/',8)
  File"/vagrant/pysearch-master/web_crawler/crawler.py", line 147, incrawl_web
    title=_get_page_tite(html)
  File"/vagrant/pysearch-master/web_crawler/crawler.py", line61, in_get_page_tite
    title=BeautifulSoup(html, "html.parser") .find('title').text
  File"/home/vagrant/.virtualenvs/dev/local/lib/python 3.4/site-packages/bs4/_init__.py", line 192, in_init__
    eliflen(markup)<=256 and (
TypeError: object of type 'NoneType' has no len()

The crawler code is listed in GitHub below.
https://github.com/wimpykid719/pythonengine/blob/master/web_crawler/crawler.py

python python3 web-scraping python-requests

2022-09-29 21:54

1 Answers

This issue was caused by 403 denied access.

This post was posted as a community wiki based on @wataru's comments.


2022-09-29 21:54

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.