Double for statement question during scrapy crawling.

Asked 2 years ago, Updated 2 years ago, 118 views

Scrapy Spider Creation Question. I'm writing a simple code and testing it The problem is saving the DB of the item that wrote the double FOR statement.

The item scrap area of the spider is shown below.

    def parse_item(self, response):
            for sel in response.xpath('//*[@id="contents"]/div[10]/section/section[1]/section[1]'):
            item = item()

            #Item scrap area

                for actress in sel.xpath("//*[@itemprop='actors']//*[@itemprop='name']"):
                    actress_ = actress.xpath("text()").extract()
                    item['Actress'] = actress_[0].strip()
                    yield item

The reason why I wrote the second for is

<div class="actress">
    <span>actress 1</span>
    <span>actress 2</span>
    <span>actress 3</span>
</div>

It's to scrap actresses 1, 2, and 3. By the way, after running the spider,

If you check the DB, only actress 3 is saved. The same goes for the other pages.

So if you print (item['Actress']) under the yard and check the command window, all actresses 1, 2, and 3 will be printed out. Even if I try debugging with a pudb, it seems that 1, 2, and 3 are all protected.(I'm not sure about this.)

I don't know what the problem is.

There's nothing wrong with the pipeline or the setting. The physical table displays a problem where only the last span tag is stored.

Please give me some advice on what the problem is ^

^

python scrapy

2022-09-22 15:26

1 Answers

Is the value returned by parse_item item?

If so, only the last value actress3 will be returned because the item value will continue to be overwritten in the second for statement.


2022-09-22 15:26

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.