I want to delete the second hyphen or subsequent character of each URL in out.csv.
The following error appears.I know it's a type error, but
I don't know what the situation is and the solution is.
TypeError: findall() missing 1 required positional argument: 'string'
import re
import csv
with open('out.csv', encoding='utf-8') asf:
reader=csv.reader(f)
URL = re.findall(r'^[^-]*-[^-]*')
for URL in reader:
print(f'{URL}')
https://www.abcde.com/-0w69e7e1w00- Ayeo
https://www.abcde.com/-0w69e7e9w70- Kakikakeko Corporation
https://www.abcde.com/-0w08e1e0w00- This is the last time I'm sorry.
https://www.abcde.com/-0w69e7e1w70- Right away
https://www.abcde.com/-0w69e6e2w54- What's wrong with you?
re.findall
is a method similar to the following, so you must specify the second parameter as the string to be processed.The error is probably because the second parameter is not specified.
re.findall(pattern, string, flags=0)
Moreover, the source of the question substitutes csv
data without using the results, so re.findall
has no meaning.
URL=re.findall(r'^[^-]*-[^-]*')
for URL in reader:
Perhaps the one you really want to use is this re.compile
?
I wouldn't be surprised if I had one parameter.
Even if you use it, you must change the variable that stores the result object in re.compile
and the variable that reads a line of csv.
You will then use the compile result object in the for
loop.
Alternatively, if you want to keep re.findall
, you should either read and process csv
as a text file instead of csv
, or change it to csv
line by line in the for
loop.
By the way, I think this article will help you with regular expressions in the URL.
How do I match Python regular expressions to a specific URL?
Check and extract URLs with python regular expressions
Retrieve domain in url regular expression (python)
Python regular expression again-match url
gruber/Liberal Regex Pattern for Web URLs
Characters allowed and not allowed in the URL
Detecting URLs by Regular Expressions
First of all, it's close to the original source and you don't use csv
, so it's a little rough, but it's going to look like this.
import re
pattern=re.compile(r" (https?:\/\/[\w:%#$&\?\(\)~\.=\+\-]+\/-[^-,]*) -[^-,]*")
with open('out.csv', encoding='utf-8') asf:
for row in f.readlines():
m=pattern.match(row)
ifm:
print(m.group(1))
Here's the result:
https://www.abcde.com/-0w69e7e1w00
https://www.abcde.com/-0w69e7e9w70
https://www.abcde.com/-0w08e1e0w00
https://www.abcde.com/-0w69e7e1w70
https://www.abcde.com/-0w69e6e2w54
© 2024 OneMinuteCode. All rights reserved.