Python Scrap Results Not Output

I'm studying scraping in Python in a virtual environment.This is page 106 of Python Crawling & Scraping.

I copy the code exactly as it is, but the result is not output.I think there is no problem with the code because there are no errors, but why is there no output?

Normally, the following line will be followed by the URL.Thank you for your cooperation.

(scraping)vagrant@ubuntu-bionic:/vagrant$python python_crawler_1.py

programs:

import requests
import lxml.html

response=requests.get('https://gihyo.jp/dp')
html=lxml.html.fromstring(response.text)
html.make_links_absolute(response.url)

for a in html.cssselect('#listbook>li>a [itemprop="url"]'):
  url = a.get('href')
  print(url)

Runtime Screen:

(scraping)vagrant@ubuntu-bionic:/vagrant$python python_crawler_1.py
(scrapping)vagrant@ubuntu-bionic: /vagrant$

python web-scraping

2022-09-30 15:40

1 Answers

Wrong CSS selector.Correctly #listBook but #listbook.

Additional: The flow of thought

The print(url) line must be executed in order for the URL to appear.If the URL does not appear, this line is probably not running.

The print(url) line is in the for statement.If this line is not executed, the contents of the for statement have never been repeated.First html.cssselect('#listbook>li>a [itemprop="url"]') is suspicious, so let's try print.

print and html.cssselect('#listbook>li>a[itemprop="url"]') You can see that the result of is an empty list If you try to run each element in the list, but pass an empty list, it will never run because there are zero elements.

Now I understand why the URL was not displayed.Now let's think about why it's an empty list.

When I actually looked at the HTML source code of https://gihyo.jp/dp in my browser, I noticed that there was no tag with the ID listbook.There is a tag with a very similar listBook ID, so you can guess that it was probably mistaken for this one.Also, I have confirmed that the tag structure around it can be selected in #listBook>li>a[itemprop="url"].



		
		
			

				

					
				

				
					2022-09-30 15:40

			
			If you have any answers or tips



		

	
		Popular Tags
	
	python x 4647
android x 1593
java x 1494
javascript x 1427
c x 927
c++ x 878
ruby-on-rails x 696
php x 692
python3 x 685
html x 656
	


	
		Popular Questions
	
	
	581 PHP ssh2_scp_send fails to send files as intended

	574 Who developed the "avformat-59.dll" that comes with FFmpeg?

	915 When building Fast API+Uvicorn environment with PyInstaller, console=False results in an error

	618 Uncaught (inpromise) Error on Electron: An object could not be cloned

	1022 In Java servlet, when SHA-256 sends WW-Authenticate header for digest authentication, the client does not return the result.