I'd like to validate the start and end tags of HTML tags in Java, but I'd like to understand the regular expression.
(\\s*(?:/\\s*)?[tag name](?:\\s*|\\s+[^>]+))(?:>|(?=<)|$(?!\\n)))
I found a reference point where I am checking the tag here, but what does it mean by regular expressions?
I analyze it myself, but it's hard to understand.If you have any advice, please let me know.
Additional information
I'm trying to analyze it in my own way.I would appreciate your advice.
①
(<(\s*(?:/\s*)?[tag name](?:\s*|\s+[^>]+))
(?:>|(?=<)|$(?!\n)))
Are these bold brackets "grouped"?We will give priority to this.Is the tag name (a) of the previous tag ( ) separated before and after?
<
Is \\s* "zero or more spaces"?
<
(?:/\\s*)
I don't really understand this part.
"Is there more than one "":"" tag?"I wonder if there are any cases where there is a "":"" tag..."
And do you need /?"Also, is ""zero or more blanks""?"
<
What is the ? after the parentheses?This is ? before [tag name]
<
(?:\s*|\s+[^>]+)
Are these bold brackets grouped?
<
?:
Is it a colon?I don't really understand the colon.
<
Is \s* a blank space of 0 characters or more?
<
Characters before or after the |?:\\s* or \\s+[^>]+
<
Is \s+ at least one blank?
<
[^>] Is + at least one character other than >?
<
(?:>|(?=<)|$(?!\n))
Are the bold brackets grouped?
<
?—Is it one colon?
<
Does > mean > is it necessary?
<
?:>|(?=<)|$(?!\n)
Is this pipe "?:>", "(?=<)" or "$(?!\n)"
I would appreciate it if you could give me guidance after that.
java regular-expression
Hello, I think you should copy the regular expression on this site and read the analysis results
If you can read English, you will understand it as it is.
After a match to <, the first open bracket allows spaces, the second open bracket allows spaces after matching to recognize both closed open tags.The tag name is followed by the tag name, and the third open bracket is followed by a blank or non-> character.After the fourth open bracket, close > while making sure that it does not precede a new line <
I think it's like that.
First of all, the regular expression you wrote seems to be embedded in the Java string literal.You already understand that the string literal "\\"
in Java represents one character of \
, but \
is also meaningful in the Markdown notation used here in the stack overflow, so it appears as \s
or mixed with \s
.
In the following description,
"
on both sides to indicate that it is a notation in the string literal"
does not appear in this regular expression, so I think it will be difficult to get confused.)If you reshape the regular expression in your question with the above rules, it looks like this.
"
(
(\\s*(?:
\\s*)?
[tag name](?:\\s*|\\s+[^
"+))(?:
>|(?=
)|$(?!\\n))
(There will be gaps in strange places, but there will be no intentional spaces.)
It would be nice if the browser you are using showed it in an easy-to-read way, but there are only three types of characters that represent </> characters themselves
All other characters are metacharacters, meaning something.All meta-characters that can be used in Java's regular expressions are listed in Java's official document.(However, it's not your fault to think "I don't know!" because you can't say it's easy to understand anywhere.Please read it in conjunction with other explanatory articles.Also, there may be some changes depending on the Java version.)
Among them, we will list the regular expressions in your question by applying them to the previous rule and shaping them.
"
XX, forward-referencing regular expression group
"
\\s
"
blank characters:[\t\n\x0B\f\r]
"
X*
"
X, 0 or more
"
(?:
X)
"
X, regular expression group without forward reference
"
X?
"
X, 1 or 0 times
"
X|YX or Y
"
X+X, at least once
"
[^
abc]
""" Non-a, b, c characters (denial)
"
(?=
X)
"
X, affirmative first reading of zero width
"
$
"
end of line
"
(?!
X)
"
X, negative read to zero width
"
\\n
" Newline Characters ("\u000A
")
As for ~ to に in the postscript, if you modify (?:
)
by dividing it into parentheses and ?:
, you will not be able to mention it individually.Please read this answer (or link) and let us know if you have any questions.
© 2024 OneMinuteCode. All rights reserved.