Python Scrap Results Not Output

Asked 1 years ago, Updated 1 years ago, 84 views

I'm studying scraping in Python in a virtual environment.This is page 106 of Python Crawling & Scraping.

I copy the code exactly as it is, but the result is not output.I think there is no problem with the code because there are no errors, but why is there no output?

Normally, the following line will be followed by the URL.Thank you for your cooperation.

(scraping)vagrant@ubuntu-bionic:/vagrant$python python_crawler_1.py 

programs:

import requests
import lxml.html

response=requests.get('https://gihyo.jp/dp')
html=lxml.html.fromstring(response.text)
html.make_links_absolute(response.url)

for a in html.cssselect('#listbook>li>a [itemprop="url"]'):
  url = a.get('href')
  print(url)

Runtime Screen:

(scraping)vagrant@ubuntu-bionic:/vagrant$python python_crawler_1.py
(scrapping)vagrant@ubuntu-bionic: /vagrant$

python web-scraping

2022-09-30 15:40

1 Answers

Wrong CSS selector.Correctly #listBook but #listbook.

Additional: The flow of thought

The print(url) line must be executed in order for the URL to appear.If the URL does not appear, this line is probably not running.

The print(url) line is in the for statement.If this line is not executed, the contents of the for statement have never been repeated.First html.cssselect('#listbook>li>a [itemprop="url"]') is suspicious, so let's try print.

print and html.cssselect('#listbook>li>a[itemprop="url"]') You can see that the result of is an empty list If you try to run each element in the list, but pass an empty list, it will never run because there are zero elements.

Now I understand why the URL was not displayed.Now let's think about why it's an empty list.

When I actually looked at the HTML source code of https://gihyo.jp/dp in my browser, I noticed that there was no tag with the ID listbook.There is a tag with a very similar listBook ID, so you can guess that it was probably mistaken for this one.Also, I have confirmed that the tag structure around it can be selected in #listBook>li>a[itemprop="url"].


2022-09-30 15:40

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.