You want to save a string that you scrap with scrapy to your data using re.sub. Hong Gil-dong (Hong Gil-dong) I removed \n (Hong Gil-dong) from this string and wrote a code to save only the Hong Gil-dong part. It appears normally in Python Shell, but when you crawl, it doesn't save properly. Below is my code on pipeline.
class NameRegexPipeline(object):
def process_item(self, item, spider):
pattern = re.compile(r'\s\(.*\)$')
for n in item['name']:
item_re = re.sub(pattern, ' ', n)
item['name'] = item_re
return item
I don't know why it doesn't seem to be a problem with the code, but it doesn't save properly TT
scrapy python
The above code will function normally only if the input factor value of process_item is in the same form as item={"name": ["Hong Gil-dong (Hong Gil-dong)"]}
.
For example, item=item={"name": "Hong Gil-dong (Hong Gil-dong)" and }
won't answer as you want. Check the input value once.
© 2025 OneMinuteCode. All rights reserved.