I'd like to match a regular expression before a specific string.
To match the original string if the specific string is not included:
What should I do?
What you want to extract:
If the target includes "aaa" or "bbb", the previous string,
If not included, I would like to match the original string.
Regular expression I tried:
"(.*?)(aaa|bbb).*"
This matches the previous string if it contains "aaa" or "bbb", but
It also matches "aaa" and "bbb" and
If it does not contain "aaa" or "bbb", it does not match anything.
Answer
re.sub
replaces the ending string pattern or later with an empty string as the result of the extraction.
Only strings that match the end string pattern will be deleted (replaced with empty strings), so if they do not match, the original string will be obtained.
code
def extract(str, pattern):
import re
return.sub(pattern, "", str)
Call Code
End string pattern="(aaa|bbb).*"
Target string 1 = "AAaaaZZZ"
Target String 2 = "AAAbbbZZZ"
Target String 3 = "AAAcccZZ"
print("Target string 1=["+str(Target string 1)+"]")
print("Target string 2=["+str(Target string 2)+"]")
print("Target string 3=["+str(Target string 3)+"]")
Result 1 = extract (target string 1, end string pattern)
print("Result 1=["+str(Result 1)+"]")
Result 2 = extract (target string 2, end string pattern)
print("Result 2=["+str(Result 2)+"]")
Result 3 = extract (target string 3, end string pattern)
print("Result 3=["+str(Result 3)+"]")
Results
End string pattern=[(aaa|bbb).*]
Target String 1 = [AAaaaZZZ]
Target String 2 = [AAAbbbZZZ]
Target String 3 = [AAAcccZZZ]
Result 1 = [AAA]
Result 2 = [AAA]
Result 3 = [AAAcccZZ]
If you can fiddle with the code, I think Akira ejiri's answer is enough.
This is the answer to the comment saying that the regular expression itself is out of the source, so the regular expression you use should not mess with the code, and the range of the regular expression itself should be the string you want to cut out.
import re
a="AAAaaaZZZ"
b="AAAbbbZZZ"
c="AAAcccZZZ"
pattern="(.*?)(?=aaa|bbb|$)"
print(re.match(pattern,a).group(0))
print(re.match(pattern,b).group(0))
print(re.match(pattern,c).group(0))
Example Execution
AAA
AAA
AAAcccZZZZ
Depending on the code you are using for the match, you may need to rewrite it slightly, but
Matches "aaa" and "bbb"
→ (as metropolis commented)
(?=~)
Nothing matches without "aaa" or "bbb"
→In addition to the target "aaa" and "bbb", $
representing the termination is put in the same row
It also matches "aaa" and "bbb"
You can use → (as metropolis commented) positive look-ahead (?=~)
If it does not contain "aaa" or "bbb", it will not match anything
→In addition to the target "aaa" and "bbb", $
representing the termination is put in the same row
This is the solution.
© 2024 OneMinuteCode. All rights reserved.