I want to extract xml designated elements in GAS

Asked 1 years ago, Updated 1 years ago, 460 views

I'm going to use GAS to create a program that will keep an eye on blog updates and notify you.

https://web.plus-idea.net/2018/04/google-apps-script-xmlservice-parse/

I was able to retrieve the contents of the page of the external site by referring to this, but
The library called XmlService does not parse xml well
I'm in trouble because I can't take out the elements

What you got is

<rdf>
  <channel></channel>
  <item><link></link><title></item>
  <item><link></link><title></item>
    :
</rdf>

I think it's not good to visit the other site many times because it's in the form of
I'm running it with the same format as above
rootDoc.getChildren() has a length of 0 and cannot retrieve child elements

I don't really understand the name space, but
Showing the contents of rootDoc

 [Element:<rdf:RDF [Namespace:http://www.w3.org/1999/02/22-rdf-syntax-ns#]/>]

So I tried specifying the URL of the namespace in the simulation of the blog, but it didn't work.
What I'm curious about is that the rootDoc itself is arranged because it'
It looks like this, but I don't know how to take out the contents.

Below is the test code

function myFunction(){
  const content=`
<rdf:RDF xmlns="http://purl.org/rss/1.0/"xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"xmlns:dc="http://purl.org/dc/elements/1.1/"xmlns:content="http://purl.org/rss/1.0/modules/content/"xmlns:cc="http://web.resource.org/cc/"xmlns:atom="http://www.w3.org/2005/Atom"xml:lang="ja">
  <channel rdf:about="http://google.com">
    ...
  </channel>
  <item rdf:about="http://google.com">
    <link>http://google.com</link>
    <title> Title> /title>
  </item>
  <item rdf:about="http://google.com">
    <link>http://google.com</link>
    <title> Title> /title>
  </item>
</rdf:RDF>
`

  var xmlDoc = XmlService.parse(content);
  varrootDoc=xmlDoc.getRootElement();
  Logger.log(rootDoc);
  varns=XmlService.getNamespace("rdf", "http://www.w3.org/1999/02/22-rdf-syntax-ns#"); 
  items=rootDoc.getChildren('item',ns);
  Logger.log(items.length);
  for (vari=0;i<items.length;i++) {
    console.log(items[i].getText());
    variable=items[i].getChild("title").getText();
    varurl=items[i].getChild("link").getText();
    var text = title + ' ' + url;
    Logger.log(text);
  }
}

Run Results

 20:51:22 Announcement Run Start
20:51:23 Information [Element:<rdf:RDF [Namespace:http://www.w3.org/1999/02/22-rdf-syntax-ns#]/>]
20:51:23 Information 0.0
20:51:24 Announcement completed

javascript google-apps-script xml

2022-10-25 11:42

1 Answers

The namespace of this XML document is declared xmlns="http://purl.org/rss/1.0/".This should be the namespace that applies to tags without prefixes.

Therefore, use this to get the namespace used by getChildren.

varns=XmlService.getNamespace("http://purl.org/rss/1.0/")

Other places that use getChild should work with this ns.


2022-10-25 11:42

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.