If the regular expression does not match, I want it to match the original string.

Asked 1 years ago, Updated 1 years ago, 62 views

I'd like to match a regular expression before a specific string.
To match the original string if the specific string is not included:
What should I do?

What you want to extract:
If the target includes "aaa" or "bbb", the previous string,
If not included, I would like to match the original string.

Regular expression I tried:

"(.*?)(aaa|bbb).*"

This matches the previous string if it contains "aaa" or "bbb", but
It also matches "aaa" and "bbb" and
If it does not contain "aaa" or "bbb", it does not match anything.

python regular-expression

2022-09-30 11:52

2 Answers

Answer
re.sub replaces the ending string pattern or later with an empty string as the result of the extraction.
Only strings that match the end string pattern will be deleted (replaced with empty strings), so if they do not match, the original string will be obtained.

code

def extract(str, pattern):
    import re
    return.sub(pattern, "", str)

Call Code

End string pattern="(aaa|bbb).*"
Target string 1 = "AAaaaZZZ"
Target String 2 = "AAAbbbZZZ"
Target String 3 = "AAAcccZZ"
print("Target string 1=["+str(Target string 1)+"]")
print("Target string 2=["+str(Target string 2)+"]")
print("Target string 3=["+str(Target string 3)+"]")

Result 1 = extract (target string 1, end string pattern)
print("Result 1=["+str(Result 1)+"]")

Result 2 = extract (target string 2, end string pattern)
print("Result 2=["+str(Result 2)+"]")

Result 3 = extract (target string 3, end string pattern)
print("Result 3=["+str(Result 3)+"]")

Results

End string pattern=[(aaa|bbb).*]
Target String 1 = [AAaaaZZZ]
Target String 2 = [AAAbbbZZZ]
Target String 3 = [AAAcccZZZ]
Result 1 = [AAA]
Result 2 = [AAA]
Result 3 = [AAAcccZZ]


2022-09-30 11:52

If you can fiddle with the code, I think Akira ejiri's answer is enough.

This is the answer to the comment saying that the regular expression itself is out of the source, so the regular expression you use should not mess with the code, and the range of the regular expression itself should be the string you want to cut out.

import re

a="AAAaaaZZZ"
b="AAAbbbZZZ"
c="AAAcccZZZ"

pattern="(.*?)(?=aaa|bbb|$)"

print(re.match(pattern,a).group(0))
print(re.match(pattern,b).group(0))
print(re.match(pattern,c).group(0))

Example Execution

AAA
AAA
AAAcccZZZZ

Depending on the code you are using for the match, you may need to rewrite it slightly, but

  • Matches "aaa" and "bbb"

    → (as metropolis commented)

  • where you can use positive look-ahead (?=~)
  • Nothing matches without "aaa" or "bbb"

    →In addition to the target "aaa" and "bbb", $ representing the termination is put in the same row

It also matches "aaa" and "bbb"

You can use → (as metropolis commented) positive look-ahead (?=~)

If it does not contain "aaa" or "bbb", it will not match anything

→In addition to the target "aaa" and "bbb", $ representing the termination is put in the same row

This is the solution.


2022-09-30 11:52

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.