I would like to change start_url every time in scrapy.
As an outline,
when you enter English words using weblio, an online English dictionary
I would like to use scrapy to output the meaning.
import scrapy
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule
import logging
from elscrapy.items import ElscrapyItem
class WordSpider (scrapy.Spider):
name = 'word'
allowed_domains=['ejje.weblio.jp']
# start_urls = ['http://ejje.weblio.jp/' ]
def__init__(self, query=', *args, **kwargs):
super(WordSpider, self).__init__(*args,**kwargs)
self.start_urls=['https://ejje.weblio.jp/content/'+query]
def parse(self, response):
word=response.xpath('//*[@id="summary"]/div[2]/p/span[2]/text()').get()
yield {
'word': word
}
I change start_urls every time by specifying a word with the command below.
scrappy cradle word-a query=relative
However, the results are as follows, and the csv output command is blank, so
I don't think it's running correctly.
ERROR:Error processing {'word':'\nrelative, relative, correlative, (with…) in comparison, and related, in response, proportional, indicating a relationship, derived by a relative'}
I want to be able to write to csv.
python web-scraping
The result is correct.
The affected html is
<span class="content-explanation ej">
</span> derived from relational, relative, correlative, (with…) and related, in response, proportional, and relationship;
Therefore, the contents of the span tag are
It starts with one new line and 16 blank spaces.
In other words, correct
It's just a discrepancy with what the questioner wants to get.
For example,
word=response.xpath('//*[@id="summary"]/div[2]/p/span[2]/text().get().replace('\n',').trip()
Why don't you remove clear line breaks and remove the front and back blank characters?
(.trip()
is sufficient if you know there is no line feed inside.)
© 2024 OneMinuteCode. All rights reserved.