Understanding Persistence of xml Files

Asked 2 years ago, Updated 2 years ago, 88 views

I want to edit the DNA sequence of DDBJ.
I can download it in xml format, but I am not familiar with the structure of xml, so
I don't understand some things.

Below is a partial format of the XML file downloaded by DDBJ.
Enter a description of the image here

The format of this xml file seems to contain different tags for the name and value of the information.
If it's an ordinary xml file (I checked it myself), it's like putting your name in the tag as an attribute, right?

Is there any merit of making it this way?
Also, I've only parsed when there are attributes in the tag, so
Could you tell me a good way to parse?
Python is assumed as the programming language.

python xml

2022-09-30 11:38

1 Answers

I don't really understand the benefits of this format, but I might be able to parse the following code.Some of the files listed above have been saved to "dbbj_example.xml".

>>import lxml.etree
>>dbbj_xml = lxml.etree.parse("dbbj_example.xml")
>>dbbj.xpath('//INSDQualifier_name [text()="organism"]/..//INSDQualifier_value')[0].text
Output: 'Acroporacervicornis'

>>[x.text for x indbbj.findall('.//INSDQualifier_name')]
Output: ['organism', 'organelle', 'mol_type', 'db_xref']

I used the etree module and xpath syntax of the lxml dictionary.

lxml:http://lxml.de/index.html
Example xpath syntax: https://docs.python.org/3/library/xml.etree.elementtree.html#elementtree-xpath


2022-09-30 11:38

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.