Want to wrap XML text nodes with element nodes (conditional)

Asked 1 years ago, Updated 1 years ago, 123 views

When I have an XML sample like the one below, I would like to convert it into an HTML tag with XSL.

<root>
    <section>
        <container>
            aaaa Corporation
            <box>
                book
            </box>
            bbb
            <box>
                pen
            </box>
            ccc
            <superscript>
                3
            </superscript>
            ddd
        </container>
    </section>
</root>

Is it possible to get the following results with XSL?I would like to wrap "aaa", "bbb", and "cc3dd" with p tags, and assign div tags to "box" and span tags to "superscript".If you don't mind, please give me some advice.

<div>
    <p>aaa</p>
    <div>book</div>
    <p>bbb</p>
    <div>pen</div>
    <p>ccc>span>3>/span>ddd>/p>
</div>

xml xsl

2022-09-30 19:14

3 Answers

Up until now, I have been doing various XSLT conversions, but even if I wish I could tag it a little more properly with input XML, I couldn't fix the original data of the customer, so I had to absorb a lot of things on the XSLT style eat side.

This time, too, if "aaa", "bbb", "ccc" ~ "ddd" are tagged with p, it's a tramway. But I don't think so. In this case, I think we can solve the problem of grouping. In other words, grouping nodes() below the container element by box element or not.

  • For box elements, convert them to div elements individually.
  • Group into p elements for other nodes

Based on argus' style sheet, the following code was realized in XSLT 2.0's xsl:for-each-group.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns: xs="http://www.w3.org/2001/XMLSchema"
    exclude-result-prefixes="xs"
    version = "2.0" >
    <xsl:output method="html" encoding="UTF-8" />

    <xsl:template match="/">
        <html>
            <body>
                <xsl:apply-templates/>
            </body>
        </html>
    </xsl:template>

    <xsl:template match="section">
        <xsl:for-each select="container">
            <div>
                <xsl:for-each-group select="node()"group-adjacent="name(.)eq'box'">
                    <xsl:choose>
                        <xsl:when test="current-group()[self::box]">
                            <xsl:apply-templates select="current-group()"/>
                        </xsl:when>
                        <xsl:otherwise>
                            <p>
                                <xsl:apply-templates select="current-group()"/>
                            </p>
                        </xsl:otherwise>
                    </xsl:choose>
                </xsl:for-each-group>
            </div>
        </xsl:for-each>
    </xsl:template>

    <xsl:template match="box">
        <div>
            <xsl:apply-templates/>
        </div>
    </xsl:template>

    <xsl:template match="superscript">
        <span>
            <xsl:apply-templates/>
        </span>
    </xsl:template>

    <xsl:template match="text()">
        <xsl:value-of-select="normalize-space(.)"/>
    </xsl:template>

</xsl:stylesheet>

The results are as follows.

<html>
   <body>
      <div>
         <p>aaa</p>
         <div>book</div>
         <p>bbb</p>
         <div>pen</div>
         <p>ccc>span>3</span>dd
         </p>
      </div>
   </body>
</html>

I think it's a natural style sheet. (For some reason, it's a new line after "ddd" and that's how much I doubt it???)

I would appreciate it if you could use it as a reference.


2022-09-30 19:14

I have never actually used XSL, but I would like to give you a rough understanding that it can describe the parsing method of DOM in meta.
Generally, if you try to parse a text node with DOM operation, the description becomes very redundant.
For example, the contents of the box tag, and the actual text node is in the text node, such as CR, TAB, TAB, TAB, book, CR.If you use a browser, you can only see a book, but that's only because of the browser rendering rules.
For example, the section tag, strictly parsed, reports containers with a text node named CR [TAB].Not only the end of the tag, but all new line breaks that people put in for easy viewing are reported to have CRs.
In the end, it's very difficult for people to interpret these things by hand.

If possible, it would be better to change the data structure and wave tags to meaningful text nodes.


2022-09-30 19:14

When I was reading sken2's answer, I came up with an idea that if I force myself to do it, I would be able to do it.

sample.xslt

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0"
  xmlns: xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:output method="html" encoding="UTF-8" />

  <xsl:template match="/">
    <html>
    <body>
    <xsl:apply-templates/>
    </body>
    </html>
  </xsl:template>

  <xsl:template match="section">
    <xsl:for-each select="container">
      <xsl:variable name="item" select="tokenize(normalize-space(.), ')'"/>
      <div>
      <xsl:for-each select="text()|*">
        <xsl:choose>
          <xsl:when test="name(.)='box'">
            <xsl:variable name="n" select="position()-1"/>
            <p><xsl:value-of-select="$item[$n]"/>/p>
            <div><xsl:value-of-select="normalize-space(.)"/>/div>
          </xsl:when>
          <xsl:when test="name(.)='superscript'">
            <xsl:variable name="n" select="position()-1"/>
            <xsl:variable name="m" select="position()+1"/>
            <p>
              <xsl:value-of-select="$item[$n]"/>
              <span><xsl:value-of-select="normalize-space(.)"/>/span>
              <xsl:value-of-select="$item[$m]"/>
            </p>
          </xsl:when>
        </xsl:choose>
      </xsl:for-each>
      </div>
    </xsl:for-each>
  </xsl:template>

</xsl:stylesheet>

The XSLT processor uses saxonb-xslt. It uses the tokenize function and requires a processor with XSLT 2.0.

for-each through choose to preserve node order.However, as sken2 said, I think it would be better to change the data structure.For example, aaa or cccc attribute makes parsing easier.

<box id="aaa">book</box>
<superscript id1="ccc" id2="ddd">3</superscript>


2022-09-30 19:14

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.