If you're confident with Python's regular expression, please ask a question!

Asked 2 years ago, Updated 2 years ago, 18 views

First question

import re

def commaParse(num):

    return re.findall('(?=(\d{3}))',num)

a = commaParse('100000000')

print(a)

Result: ['100', '000', '000', '000', '000', '000', '000', '000']

Second question

import re

def commaParse(num):

    return re.findall('(?=\d{3})',num)


a = commaParse('100000000')

print(a)

Results:['','','','','','','',']

Here's the question. You might think it's really nothing, but I can't find an answer no matter how much I look for it.
First of all, the difference in compilation options is the difference between (?=\d{3}) and (?=(\d{3})) inside brackets, but I want to know why the results are so different.
In the first question, my expected answer should be ['100', '000', '000'] as a positive forward search, but I don't know why it's coming out like that...
And in the second question, I know that if it is matched with a positive search, it returns an empty string, so I thought ['','','] would come out, but why more...

python

2022-09-22 18:53

1 Answers

positive lookahead.

If you google it, it comes out very, very much, so I'll skip the explanation...

word(?=ahead)

Regular expression that finds the word word but only word with the word ahead in front of it.

The regular expression you wrote doesn't have the pattern you want to find corresponding to word for now! So it's a strange regular expression with unclear intent.

However, in the regular expression (\d{3}), the () parentheses mean capturing, so they return this captured matching result as a value, which is not the intended match by the author!

In fact, the matching value for the regular expression in (?=) for forward navigation is also used for subsequent matching because it does not consume .

So, contrary to the intention, seven matches are made up of 7 matches.

If you find and consume three as intended, you have to move on to the next number, but you can't actually consume them, so you move forward one by one and capture all the patterns. Like below.

The questioner didn't put the actual part he was looking for in the regular expression The empty string immediately preceding the three highlighted numbers is the matched part. The highlighted numbers are captured because of the () parentheses, so there are a total of 7 results.

Now, you'll understand why this second attempted regular expression without (?=\d{3}) parentheses has 7 bin values.

Just \d{3} is enough to match 3 numbers as originally intended.


2022-09-22 18:53

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.