How do I detect only the outermost parenthesis when the parenthesis is nested in the string processing?
"That kind of ""eo"" or ""ki"" or ""ke"" or ""ke"" or ""ke"" or ""ke"" or ""ke"" or ""ke"" or ""ke"" or ""ke"" or ""ke""
For example, starting with the string above,
"Eo ""Ka"" and ""Ka"" and ""Ka"" and ""Ka"" and ""Ka"" and ""Ka"""
Tsuke (literally, "tsuzu")
I'd like to extract only two places and add a new line code before and after.
text="That kind of "eo" or "ki" or "ke" or "ke" or "ke" or "ke" or "ke" or "ke" or "ke" or "ke" or "ke" or "ke" or "ke" or "ke" or "ke" or "ke" or
regex = re.compile(r'(([^"]*"[^"]*"[^"]*")*[^"]*"))
text = re.sub(regex, r'\n\1\n', text)
In the above code, the new line code is included before and after the ka 'ki' and 'ke' and before the 'tsute'.
I can handle double nesting, but I want to be able to handle triple, quadruple, and nesting even if I don't know how many layers there are.
I use python 3.5.1
In response to your response, we have summarized the functions as follows.
You can now extract parts enclosed in quotation marks.
# Insert a new line before and after the parenthesis (applies to the outermost parenthesis if the parenthesis is nested)
default_separator_before_and_after_brackets(text, width=10, separator_pre='\n', separator_post='\n'):
import regex#Note that the regex module in the extended regular expression is different from the re in the standard module.
# If the length of the parenthesis is greater than or equal to width, insert a new line before and after.
default_separator(match_obj):
match_text = match_obj.group()
iflen(match_text)>=width:
return'{}{}{}'.format(separator_pre, match_text, separator_post)
else:
return match_text
rexp1 = regex.compile(r'(?>[^"]+|(?R))*")')
rexp2 = regex.compile(r'(?>[^]]+|(?R))*') ' )
rexp3 = regex.compile(r'(?>[^[]]+|(?R))* ]')
rexp4 = regex.compile(r'(\(?>[^\(\)]+|(?R))*\))')
rexp5 = regex.compile(r'(?>[^\[\]]]+|(?R))*\]')
rexp6 = regex.compile(r'(?>[^""]+|(?R))*")')
for rexp in [rexp1, rexp2, rexp3, rexp4, rexp5, rexp6]:
text=rexp.sub(add_separator, text)
return text
I think the easiest way is to count the number of parentheses.In other words:
Or you can write a proper person, but I feel like I'm essentially doing something similar.
If you really want to use regular expressions, you can use regular expressions with recursive expressions, and Python uses regex
instead of re
.For example, the following regular expressions are possible:
(?<rec>(?:[^"]+|(?&rec))*)
This (extended) regular expression is as follows:
(?<rec>...)
: Remember this under the name rec
.(?:[^"]+|(?&rec))
—A string without key parentheses continues or matches the rec itself.
(?:...)
indicates the group that checks for matches but does not capture them (Reference).(?:...)
indicates the group that checks for matches but does not capture them (Reference).Takayuki's answer contains a more compact regular expression (although it is essentially the same), so I think this is also helpful.
Also, regular expressions may affect performance.I'm not familiar with Python's regular expression engine implementation, so if it affects you, you should try it out and change it to the simple method described above.
Related
By the way, you cannot check pure regular expression to see if parentheses are nested well.
A well-corresponded set of parentheses is called Dike (Dyck) because theoretically it is known that it is not a regular language.
For more information on this point, Wikipedia articles and the book Introduction to Regular Expression Technology may be helpful.
There was an answer to the English question.
https://stackoverflow.com/questions/26385984/recursive-pattern-in-regex
Install regex to use recursive matches
(venv)> pip install regex
Collecting regex
Downloading regex-2017.02.08-cp35-none-win_amd64.whl (242kB)
100% |##################################|245kB 1.5MB/s
Installing collected packages—regex
Successfully installed regex - 2017.2.8
Run
(venv)>python3
>>import regex
>>>text="That kind of ""eo"" or ""ki"" or ""ke"" or ""ke"" or ""ke"" or ""ke"" or ""ke"" or ""ke"" or ""ke"" or ""ke"" or ""ke"" or ""ke" or ""ke"" or ""ke""
>>>f=regex.compile(r' "(?>[^"]+|(?)R))*)"")
>>f.findall(text)
["Oh" or "Ki" or "Ki" or "Ki" or "Ki"]
>>print(f.sub(r'\n\1\n', text))
like that
`Oh, 'Ki' ku 'ke' ku 'ko' shashi,'
slovenly
"Tell me,"
with
Regular expressions are required, but for your information, separate solutions.
Change to
[
] and to
in the string to convert it to a list of location information (index) for each character.Then take out only the list in the list and take the desired substring from the first and last elements of the list.
#!/usr/bin/python3
def parse(txt):
lst=eval(
str(sum(
[
['['+str(i+1)] if v=='else
[str(i-1)+']'] if v=='" else
[i] for(i,v)in enumerate(txt)
], [ ]).replace("', ')
)
return [txt[l[0]:l[-1]+1] for install if instance(l,list)]
##
text=[
"Ai, ""Eo"" or ""ki"" or ""ke"" or ""ko"" or ""ke"" or ""ke"" or ""ke"" or ""ke"" or ""ke"" or ""ke" or ""ke"
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
'',
]
[print(parse(t)) for text]
## =>
## ["Oh" or "Ki" or "Ki" or "Ki" or "Ki"]
## ['','] "'', 'Ah'', 'i'', 'u']
## []
© 2024 OneMinuteCode. All rights reserved.