Understanding Regular Expressions in Java for HTML

Asked 1 years ago, Updated 1 years ago, 34 views

What are the Java regular expressions?

<HTML><HEAD><META http-equiv="Content-Type" content="text/html;charset=UTF-8">/HEAD><BODY><DIV style="background-color:text-align:left;word-break:break-all;word-wrap:break-word;"><DIV><SPAN style="color:#0041c2;"><STRONG>test_1</STRONG></SPAN><SPAN><A href="dummy_1">[file_name_1]</A></SPAN></DIV><DIV><SPAN style="font-family:HGP創英角ゴシックUB;font-size:32px;"><STRONG>test_2</STRONG></SPAN><SPAN><A href="dummy_2">[filename_2]</A>>/SPAN>>STRONG>test_3</STRONG>>>>>/SPAN>>> style UTG: SPAN: Creation angle

What should I do if I want to use the java regular expression to separate the <span> tag with </span> if I have the above code?

<SPAN style="color:#0041c2;"><STRONG>test_1</STRONG></SPAN>
<SPAN><A href="dummy_1">[file_name_1]</A>/SPAN>
<SPAN style="font-family:HGP English angle UB;font-size:32px;">STRONG>test_2</STRONG>>/SPAN>
<SPAN><A href="dummy_2">[filename_2]</A>/SPAN>
<SPAN><STRONG>test_3</STRONG></SPAN>
<SPAN style="font-family:HGP English angle UB;font-size:32px;">test_4</SPAN>

I'd like to take it out like this.

<span(\"[^\"]*\"|'[^']*'|[^'\">])*>(\"[^\"]*\"|'[^'\">])*<(\"[^\"]*\"|'[^'\"*'|[^'\">].*?/span><>

I have specified it as shown in , but it doesn't work.

java html regular-expression

2022-09-29 22:11

1 Answers

If you write <span(\", only <span" matches without considering spaces such as <span"~.

The following sample code is an example of a regular expression considering <span> and <span~.
The regular expression \\s is a blank character that represents a space or tab character.
? is the shortest number of matches quantum that minimizes the length of a match.

import java.util.regex.Matter;
import java.util.regex.pattern;

public class Main {

    public static void main(String[]args) {
        String html = "<HTML><HEAD><META http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8\"></HEAD><BODY><DIV style=\"background-color: ;text-align:left;word-break:break-all;word-wrap:break-word;\"><DIV><SPAN style=\"color:#0041c2;\"><STRONG>test_1</STRONG></SPAN><SPAN><A href=\"dummy_1\">[file_name_1]</A></SPAN></DIV><DIV><SPAN style=\"font-family:HGP創英角ゴシックUB;font-size:32px;\"><STRONG>test_2</STRONG></SPAN><SPAN><A href=\"dummy_2\">[filename_2]</A></SPAN><SPAN><STRONG>test_3</STRONG></SPAN><SPAN style=\"font-family:HGP創英角codec UB;font-size:32px;\">test_4</SPAN></DIV></BODY>>/HTML>";

        Pattern p=Pattern.compile("<SPAN[>\\s].+?</SPAN>");
        Matcher m=p.matcher(html);
        while(m.find()){
            System.out.println(m.group());
        }
    }
}


2022-09-29 22:11

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.