I want to extract the abstract <text>
in the
in the xml file named <passage>
result.xml
.
For <text>
, I want to extract the Neurological applications of COVID-19, ~ensphalopathy.
and The rapid evolution~replication.
statements.
Search criteria include either the words COVID-19
or SARS-CoV-2
in the <text>
statement.
files:result.xml
<collection>
<document>
<passage>
<infon key="authors">Gupta NA,Lien C,IvM,</infon>
<offset>0</offset>
<text>Critical ilness-associated cerebral microbles in severity COVID-19 inspection</text>
<announcement id="5">
<location offset="68" length="9"/>
<text>infection</text>
</annotation>
</passage>
<passage>
<infon key="section_type">ABSTRACT</infon>
<infon key="type">abstract</infon>
<offset>81</offset>
<text>Neurological compositions of COVID-19 inspection have been included frequently described and included dizziness, headache, loss of taste and smell, stroke, and encephalopathy.</text>
</passage>
<passage>
<infon key="section_type">ABSTRACT</infon>
<infon key="type">abstract_title_1</infon>
<offset>584</offset>
<text>Highlights</text>
</passage>
</document>
<document>
<passage>
<infonkey="name_4">surname:Ansari;given-names:M.Azim</infon>
<offset>0</offset>
<text>Extensive C->U transition bias in the genomes of a wide range of mammarian RNA viruses;potential associations with transcriptive mutations, damage-or host-medicated editing of viral RNA</text>
<announcement id="1">
<infon key="identifier">9606</infon>
<infon key="type">Species</infon>
<location offset="67" length="9"/>
<text>mammalian</text>
</annotation>
</passage>
<passage>
<infon key="type">abstract</infon>
<offset>191</offset>
<text>The rapid evolution of RNA viruses SARS-CoV-2 has been long consolidated to result from a combination of high copying error frequencies during RNA replication.</text>
</passage>
<passage>
<infon key="section_type">ABSTRACT</infon>
<infon key="type">abstract_title_1</infon>
<offset>2033</offset>
<text>Author summary</text>
</passage>
</document>
</collection>
Script: 1.py
from bs4 import BeautifulSoup
with open('result.xml') as xml:
soup = BeautifulSoup(xml, 'xml')
result = [ ]
for passage in group.find_all('passage'):
text=passage.text
if text and ('COVID-19' in text or 'SARS-CoV-2' in text):
for line in text.splitlines():
ifline.strip().endswith('.'):
result3.append(line)
print(*result, sep='^\n')
Use BeautifulSoup.
from bs4 import BeautifulSoup
soup = BeautifulSoup(xml, 'xml')
result = [ ]
for passage in group.find_all('passage'):
text=passage.text
if text and ('COVID-19' in text or 'SARS-CoV-2' in text):
for line in text.splitlines():
ifline.strip().endswith('.'):
result.append(line)
print(*result, sep='\n')
# Neurological compositions of COVID-19 infection have been recently included and include dizziness, headache, loss of taste and smell, stroke, and encephalopathy.
# The rapid evolution of RNA viruses SARS-CoV-2 has been long considered to result from a combination of high copying errors during RNA replication.
if text and (...
, we use a short-circuit evaluation to prevent the next test in...
from spouting errors when the text is None (if the passage does not contain a text tag).
When using the CSS selector with BeautifulSoup.
from bs4 import BeautifulSoup
with open('result.xml') as xml:
soup = BeautifulSoup(xml, 'xml')
texts = group.select(' ''
passage>
infon [key="type"]: -soup-contains("abstract"):not(:-soup-contains("_title")~
text:-soup-contains ("COVID-19", "SARS-CoV-2")
''')
text = [t.text for text in text]
print('\n'.join(text))
#
Neurological compositions of COVID-19 infection have been recently included and include dizziness, headache, loss of taste and smell, stroke, and encephalopathy.
The rapid evolution of RNA viruses SARS-CoV-2 has been long considered to result from a combination of high copying errors during RNA replication.
© 2025 OneMinuteCode. All rights reserved.