[No Language] How to erase parent elements with child elements with specific strings in XML

Asked 2 years ago, Updated 2 years ago, 116 views


From the XML format file below (Example 1. Before Conversion) I would like to delete the whole element that contains the test

There are about 2000 sparse elements containing tests.
I'd like to make it absolutely automatic because it's a recurring task.
I've been thinking about how to use sed.

(Example 1. Before conversion) → Delete elements containing test → (Example 2. After conversion)

For information on how to remove elements that contain this test, see
Could you give me some advice?

The implementation language is intended for shell scripts, regardless of tools or language.

I think it's probably an easy win for those who have experience in implementing it and those who find the keywords necessary to search for similar methods.Therefore, it would be very helpful to answer only the keywords that make sense.

(Example 1. Before Conversion)

<dependency>
      <groupId>com.fasterxml.jackson.core</groupId>
      <artifactId>jackson-annotations</artifactId>
      <version>${jackson.version}</version>
      <type>jar</type>
      <scope>compile</scope>
</dependency>
<dependency>
      <groupId>com.fasterxml.jackson.core</groupId>
      <artifactId>jackson-annotations</artifactId>
      <version>${jackson.version}</version>
      <type>test</type>
      <scope>compile</scope>
</dependency>

(Example 2. After Conversion)

<dependency>
      <groupId>com.fasterxml.jackson.core</groupId>
      <artifactId>jackson-annotations</artifactId>
      <version>${jackson.version}</version>
      <type>jar</type>
      <scope>compile</scope>
</dependency>

shellscript xml sed

2022-09-30 21:27

4 Answers

sed is a line-oriented command, so handling XML-like formats is very difficult and almost impossible.
To handle XML in a shell script,

  • Use XML-only commands
  • Convert XML to a row-oriented format before processing it
  • Create only the appropriate parts as commands in a programming language with an XML library.

and so on.
Some XML-specific commands include xsltproc, xmllint, and xmlstarlet (you can edit it easily with xpath).

example using xmlstarlet:

#We are not fully considering whether this xpath meets the requirements of the question.
xmlstarpled-d'// dependency [type[text()="test"]]'in.xml


2022-09-30 21:27

This kind of filtering is easy if you use XSLT.

Delete the dependency element that contains "test" in the child element text.

To interpret it as , you can do it at once with the following XSLT style sheet.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    version = "1.0" >

    <!--- Read the dependency element that contains "test" in the text of the child element -->
    <xsl: template match="dependency [*[contains(., 'test')]]"/>

    <!--Simply copy the others-->
    <xsl:template match="*">
        <xsl:copy>
            <xsl: copy-of-select="@*"/>
            <xsl:apply-templates/>
        </xsl:copy>
    </xsl:template>

</xsl:stylesheet>

XSLT 1.0 features are sufficient. I did it with Saxon, but I should be able to do it with MSXML in Windows.


2022-09-30 21:28

It might be better to create a program using the XML parser library.If it's a minimum running java, I think it'll be fine like this.

import java.io.File;
import java.io.FileOutputStream;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;

import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;

public class RemoveTestDependencies {
    public static void main(String[]args) {
        try{
            DocumentBuilderFactory dbfactory = DocumentBuilderFactory.newInstance();
            DocumentBuilder builder=dbfactory.newDocumentBuilder();
            Document doc = builder.parse (new File(args[0]));
            Element root=doc.getDocumentElement();
            NodeList childNodes=root.getChildNodes();
            for(inti=0;i<childNodes.getLength();i++){
                Node item=childNodes.item(i);
                if("dependencies".equals(item.getNodeName()))}
                    NodeList childNodes2 = item.getChildNodes();
                    int length = childNodes2.getLength();
                    for (int j=0; j<childNodes2.getLength(); j++){
                        Node item2 = childNodes2.item(j);
                        if("dependency".equals(item2.getNodeName()))}
                            NodeList childNodes3 = item2.getChildNodes();
                            for(intk=0;k<childNodes3.getLength();k++){
                                Node item3 = childNodes3.item(k);
                                if(item3!=null&"type".equals(item3.getNodeName())
                                        &item3.getTextContent()!=null
                                        &item3.getTextContent().indexOf("test")>=0){
                                    item.removeChild(item2);
                                }
                            }
                        }
                    }
                    System.out.println("Removed"+(length-childNodes2.getLength())+"dependencies");
                }
            }
            TransformerFactory transFactory=TransformerFactory.newInstance();
            Transformer transformer=transFactory.newTransformer();
            DOMSource source = new DOMSource(doc);
            File newXML = new File(args[1]);
            FileOutputStreamos=newFileOutputStream(newXML);
            StreamResult result=new StreamResult(os);
            transformer.transform(source, result);
        } catch(Exceptione){
            e.printStackTrace();
        }
    }
}

Pass the pom.xml path before and after modification to the argument.

java RemoveTestDependencies C:\test\pom.xml C:\test\pom_new.xml

適当I wrote it appropriately, so you need to implement the exception handling properly.


2022-09-30 21:28

## For GNUsed
$ sed -- version
sed (GNUsed) 4.4
$ sed-nr'
    /<dependency>/, /<\/dependency>/{
      H
      /<\/dependency>/{
        x
        s/^\n//
        /<type>test<\/type>/!p
        s/.*//
        h
      }
    }' dependency.xml

## For Awk
$ awk-vRS = '</?dependency>'
    /<type> /&!/<type>test<\/type>/{
      print"<dependency>"$0"</dependency>"
    }' dependency.xml

## For GNU grep:
$ grep -- version
grep(GNU grep) 3.1
$ grep-Pzo'<dependency>(?!</dependency>)(.|\n))*?<type>(?!test<.)*?/type>(.|\n)*?</dependency>\n?'dependency.xml 

For the GNUsed version, the <dependency> tag should not be in the same line and the <type> tag should be in the same line, so it seems quite unreasonable (because we force XML string formatting).

The Awk version uses the RS variable to treat the <dependency>~</dependency> block as one logical line, and the GNU grep version uses negative lookahead(?!).

"Overall, ""there is nothing you can't do if you do it."""I don't recommend it.


2022-09-30 21:28

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.