Detect parenthesis nesting in python

Asked 2 years ago, Updated 2 years ago, 128 views

How do I detect only the outermost parenthesis when the parenthesis is nested in the string processing?

"That kind of ""eo"" or ""ki"" or ""ke"" or ""ke"" or ""ke"" or ""ke"" or ""ke"" or ""ke"" or ""ke"" or ""ke"" or ""ke""

For example, starting with the string above,

"Eo ""Ka"" and ""Ka"" and ""Ka"" and ""Ka"" and ""Ka"" and ""Ka"""
Tsuke (literally, "tsuzu") I'd like to extract only two places and add a new line code before and after.

text="That kind of "eo" or "ki" or "ke" or "ke" or "ke" or "ke" or "ke" or "ke" or "ke" or "ke" or "ke" or "ke" or "ke" or "ke" or "ke" or "ke" or
regex = re.compile(r'(([^"]*"[^"]*"[^"]*")*[^"]*"))
text = re.sub(regex, r'\n\1\n', text)

In the above code, the new line code is included before and after the ka 'ki' and 'ke' and before the 'tsute'.
I can handle double nesting, but I want to be able to handle triple, quadruple, and nesting even if I don't know how many layers there are.

I use python 3.5.1

In response to your response, we have summarized the functions as follows.
You can now extract parts enclosed in quotation marks.

# Insert a new line before and after the parenthesis (applies to the outermost parenthesis if the parenthesis is nested)
default_separator_before_and_after_brackets(text, width=10, separator_pre='\n', separator_post='\n'):
    import regex#Note that the regex module in the extended regular expression is different from the re in the standard module.

    # If the length of the parenthesis is greater than or equal to width, insert a new line before and after.
    default_separator(match_obj):
        match_text = match_obj.group()
        iflen(match_text)>=width:
            return'{}{}{}'.format(separator_pre, match_text, separator_post)
        else:
            return match_text

    rexp1 = regex.compile(r'(?>[^"]+|(?R))*")')
    rexp2 = regex.compile(r'(?>[^]]+|(?R))*') ' )
    rexp3 = regex.compile(r'(?>[^[]]+|(?R))* ]')
    rexp4 = regex.compile(r'(\(?>[^\(\)]+|(?R))*\))')
    rexp5 = regex.compile(r'(?>[^\[\]]]+|(?R))*\]')
    rexp6 = regex.compile(r'(?>[^""]+|(?R))*")')
    for rexp in [rexp1, rexp2, rexp3, rexp4, rexp5, rexp6]:
        text=rexp.sub(add_separator, text)
    return text

python regular-expression

2022-09-30 21:22

3 Answers

I think the easiest way is to count the number of parentheses.In other words:

  • Prepare one variable to count the nested levels of parentheses, read one character at a time from the beginning of the string, and set the level to +1 if there is an open parenthesis and -1 if there is a closed parenthesis.
  • In the middle of that, "Levels went from 0 to 1" to "Levels went from 1 to 0" are surrounded by the outermost parentheses.
  • If the level is negative somewhere, the parentheses are not corresponding well.Error.

Or you can write a proper person, but I feel like I'm essentially doing something similar.

If you really want to use regular expressions, you can use regular expressions with recursive expressions, and Python uses regex instead of re.For example, the following regular expressions are possible:

(?<rec>(?:[^"]+|(?&rec))*)

This (extended) regular expression is as follows:

  • (?<rec>...): Remember this under the name rec.
  • (?:[^"]+|(?&rec))—A string without key parentheses continues or matches the rec itself.
    • (?:...) indicates the group that checks for matches but does not capture them (Reference).
    • rec itself means "corresponding open-to-closed parentheses string", so it's okay to have parentheses in it.
  • (?:...) indicates the group that checks for matches but does not capture them (Reference).
  • rec itself means "corresponding open-to-closed parentheses string", so it's okay to have parentheses in it.

Takayuki's answer contains a more compact regular expression (although it is essentially the same), so I think this is also helpful.

Also, regular expressions may affect performance.I'm not familiar with Python's regular expression engine implementation, so if it affects you, you should try it out and change it to the simple method described above.

Related

By the way, you cannot check pure regular expression to see if parentheses are nested well.
A well-corresponded set of parentheses is called Dike (Dyck) because theoretically it is known that it is not a regular language.

For more information on this point, Wikipedia articles and the book Introduction to Regular Expression Technology may be helpful.


2022-09-30 21:22

There was an answer to the English question.
https://stackoverflow.com/questions/26385984/recursive-pattern-in-regex

Install regex to use recursive matches

(venv)> pip install regex
Collecting regex
  Downloading regex-2017.02.08-cp35-none-win_amd64.whl (242kB)
    100% |##################################|245kB 1.5MB/s
Installing collected packages—regex
Successfully installed regex - 2017.2.8

Run

(venv)>python3
>>import regex
>>>text="That kind of ""eo"" or ""ki"" or ""ke"" or ""ke"" or ""ke"" or ""ke"" or ""ke"" or ""ke"" or ""ke"" or ""ke"" or ""ke"" or ""ke" or ""ke"" or ""ke""
>>>f=regex.compile(r' "(?>[^"]+|(?)R))*)"")
>>f.findall(text)
["Oh" or "Ki" or "Ki" or "Ki" or "Ki"]
>>print(f.sub(r'\n\1\n', text))
like that
`Oh, 'Ki' ku 'ke' ku 'ko' shashi,'
slovenly
"Tell me,"
with


2022-09-30 21:22

Regular expressions are required, but for your information, separate solutions.

Change to [] and to in the string to convert it to a list of location information (index) for each character.Then take out only the list in the list and take the desired substring from the first and last elements of the list.

#!/usr/bin/python3

def parse(txt):
  lst=eval(
    str(sum(
      [
        ['['+str(i+1)] if v=='else
        [str(i-1)+']'] if v=='" else
        [i] for(i,v)in enumerate(txt)
      ], [ ]).replace("', ')
  )

  return [txt[l[0]:l[-1]+1] for install if instance(l,list)]

##
text=[
  "Ai, ""Eo"" or ""ki"" or ""ke"" or ""ko"" or ""ke"" or ""ke"" or ""ke"" or ""ke"" or ""ke"" or ""ke" or ""ke"
  ''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
  '',
]

[print(parse(t)) for text]
## =>
## ["Oh" or "Ki" or "Ki" or "Ki" or "Ki"]
## ['','] "'', 'Ah'', 'i'', 'u']
## []


2022-09-30 21:22

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.