I want to change start_urls every time.

Asked 1 years ago, Updated 1 years ago, 86 views

I would like to change start_url every time in scrapy.
As an outline,
when you enter English words using weblio, an online English dictionary I would like to use scrapy to output the meaning.

import scrapy
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule
import logging
from elscrapy.items import ElscrapyItem
class WordSpider (scrapy.Spider):
    name = 'word'
    allowed_domains=['ejje.weblio.jp']
    # start_urls = ['http://ejje.weblio.jp/' ]
  

    def__init__(self, query=', *args, **kwargs):
        super(WordSpider, self).__init__(*args,**kwargs)
        self.start_urls=['https://ejje.weblio.jp/content/'+query]



    def parse(self, response):
        word=response.xpath('//*[@id="summary"]/div[2]/p/span[2]/text()').get()
        yield {
            'word': word
        }

I change start_urls every time by specifying a word with the command below.

scrappy cradle word-a query=relative

However, the results are as follows, and the csv output command is blank, so
I don't think it's running correctly.

ERROR:Error processing {'word':'\nrelative, relative, correlative, (with…) in comparison, and related, in response, proportional, indicating a relationship, derived by a relative'}

I want to be able to write to csv.

python web-scraping

2022-09-30 19:23

1 Answers

The result is correct.
The affected html is

<span class="content-explanation ej">
                </span> derived from relational, relative, correlative, (with…) and related, in response, proportional, and relationship;

Therefore, the contents of the span tag are
It starts with one new line and 16 blank spaces.
In other words, correct
It's just a discrepancy with what the questioner wants to get.

For example,

word=response.xpath('//*[@id="summary"]/div[2]/p/span[2]/text().get().replace('\n',').trip()

Why don't you remove clear line breaks and remove the front and back blank characters?
(.trip() is sufficient if you know there is no line feed inside.)


2022-09-30 19:23

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.