Questions during Python Scrapy tutorial

Asked 2 years ago, Updated 2 years ago, 130 views

Hi, everyone. I'm asking you a question because I don't know anything while studying while following the scratch tutorial. The code is as follows.

def parse(self, response):
    for href in response.css("ul.directory.dir-col > li > a::attr('href')"):
        url = response.urljoin(href.extract())
        yield scrapy.Request(url, callback=self.parse_dir_contents)

Among the above function codes, ul.directory.dir-col > li > a:attr('href') and callback=self.This is a question about parse_dir_contents.

I think it's the first css path, but I wonder if it replaced "/" with the symbol ">". And callback=self.Parse_dir_contents I don't understand this part, so I'd appreciate it if you could briefly explain the concept. ^{^}

^{^}

I'm searching on the Internet and looking for a manual, but I don't know. I'm asking you a question because I'm frustrated.

2022-09-22 12:20

1 Answers

First ul.directory.dir-col > li > a::attr('href')

Above is the CSS Selector.

In the CSS Selector, for example, E>F, F stands for the child node of E. There may be several Fs, but it means that you only choose F below E.

See Note 1 and Note 2.

Second callback=self.parse_dir_contents

Refer to Note 3

Now the parse() method only extract the interesting links from the page, builds a full absolute URL using the response.urljoin method (since the links can be relative) and yields new requests to be sent later, registering as callback the method parse_dir_contents() that will ultimately scrape the data we want. What you see here is the Scrapy’s mechanism of following links: when you yield a Request in a callback method, Scrapy will schedule that request to be sent and register a callback method to be executed when that request finishes.

It seems to be a function to find additional links contained in the document in url. In the example of Note 3 above, it appears to extract sublink information from the response obtained from url and reserve it as the next crawling target.

2022-09-22 12:20

If you have any answers or tips

Popular Tags

python x 4647

android x 1593

java x 1494

javascript x 1427

c x 927

c++ x 878

ruby-on-rails x 696

php x 692

python3 x 685

html x 656

Popular Questions

574 Who developed the "avformat-59.dll" that comes with FFmpeg?

613 GDB gets version error when attempting to debug with the Presense SDK (IDE)

590 Scrap text information after the "View More" button when searching in the Yahoo! News search window

582 PHP ssh2_scp_send fails to send files as intended

881 /usr/bin/google-chrome:symbol lookup error:/usr/bin/google-chrome: undefined symbol:gbm_bo_get_modifier

© 2024 OneMinuteCode. All rights reserved.