Scrapy Spider Creation Question. I'm writing a simple code and testing it The problem is saving the DB of the item that wrote the double FOR statement.
The item scrap area of the spider is shown below.
def parse_item(self, response):
for sel in response.xpath('//*[@id="contents"]/div[10]/section/section[1]/section[1]'):
item = item()
#Item scrap area
for actress in sel.xpath("//*[@itemprop='actors']//*[@itemprop='name']"):
actress_ = actress.xpath("text()").extract()
item['Actress'] = actress_[0].strip()
yield item
The reason why I wrote the second for is
<div class="actress">
<span>actress 1</span>
<span>actress 2</span>
<span>actress 3</span>
</div>
It's to scrap actresses 1, 2, and 3. By the way, after running the spider,
If you check the DB, only actress 3 is saved. The same goes for the other pages.
So if you print (item['Actress']) under the yard and check the command window, all actresses 1, 2, and 3 will be printed out. Even if I try debugging with a pudb, it seems that 1, 2, and 3 are all protected.(I'm not sure about this.)
I don't know what the problem is.
There's nothing wrong with the pipeline or the setting. The physical table displays a problem where only the last span tag is stored.
Please give me some advice on what the problem is ^
^ python scrapy
Is the value returned by parse_item
item
?
If so, only the last value actress3
will be returned because the item
value will continue to be overwritten in the second for statement.
© 2024 OneMinuteCode. All rights reserved.