Python and Scrappy questions.

Asked 2 years ago, Updated 2 years ago, 149 views

You want to save a string that you scrap with scrapy to your data using re.sub. Hong Gil-dong (Hong Gil-dong) I removed \n (Hong Gil-dong) from this string and wrote a code to save only the Hong Gil-dong part. It appears normally in Python Shell, but when you crawl, it doesn't save properly. Below is my code on pipeline.

class NameRegexPipeline(object):
    def process_item(self, item, spider):
        pattern = re.compile(r'\s\(.*\)$')
        for n in item['name']:
            item_re = re.sub(pattern, ' ', n)
        item['name'] = item_re
        return item

I don't know why it doesn't seem to be a problem with the code, but it doesn't save properly TT

scrapy python

2022-09-22 21:17

1 Answers

The above code will function normally only if the input factor value of process_item is in the same form as item={"name": ["Hong Gil-dong (Hong Gil-dong)"]}.

For example, item=item={"name": "Hong Gil-dong (Hong Gil-dong)" and } won't answer as you want. Check the input value once.


2022-09-22 21:17

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.