Working with xml Files

Asked 1 years ago, Updated 1 years ago, 380 views

I want to create a program that extracts the contents of text at a certain location from an xml file, stores it in a text file, extracts co-occurrence words between the csv file and the full text in the text file, and outputs the number of co-occurrence words and the given id.

Example Execution Results)
(common terminology)    Number of common terms given to terms id)

    acute20000
    distress10000
    coronavirus1111
    China 211111

(Number of id given)
      0000                3
      1111                3

Source Code

 from bs4 import BeautifulSoup
import csv

# Load xml files
with open('ab36_37.xml', 'r', encoding='utf-8') as xml:      
    soup = BeautifulSoup(xml, 'xml')

# Extract the strings of COVID-19 and SARS-CoV-2 in the paragraph text in the passage

texts = group.select(' ''
passage>
  infon[key="type"]:-soup-contains("paragram")~text:-soup-contains("SARS-CoV-2")
''')
text = [t.text for text in text]
xml.close()

# Save results to specified file
with open('re_ab3637.txt', 'w') as txt:
  print('\n'.join(text), file=txt)
txt.close

Example csv file

0000,acute
0000, distress
1111, coronavirus
1111, China

Text File Example

Severe acute response distress syndrome due to acute coronavirus (SARS-CoV-2), which was first diagnosed in China, China in December 2019.

python python3

2022-11-03 00:01

1 Answers

You just count the words in CSV from the sample text, right?
I use the expression co-occurrence, so it just sounds difficult
There is a method of counting the frequency of occurrence of a string from a string, so it's fairly easy to do.
https://hibiki-press.tech/python/count/103

Please rename the file as appropriate

defmain():
    # Array CSVs with IDs and words one line at a time
    with open('words.csv', 'r') as f:
        rows=f.readlines()

    # put the entire sentence someone looks for in a search for
    with open('re_ab3637.txt', 'r') asf:
        text=f.read()

    # Map to id=>count
    id_count = {}

    with open('result1.csv', 'w') as f:
        For row in rows:
            # Divide the string id, word into
            tmp = row.split(',')
            id=tmp[0]
            # remove as it has a new line
            word=tmp[1].trip()
            # count the number of words in the text
            count=text.count(word)
            f.write('%s, %d, %s\n'%(word, count, id))

            # If you already have an id, add a count.
            if id in id_count:
                id_count [id] + = count
            else:# If not, create an entry
                id_count [id] = count

    # Output id=>count
    with open('result2.csv', 'w') as f:
        for id, count in id_count.items():
            f.write('%s, %d\n'%(id, count))


if__name__=='__main__':
    main()

result1.csv

acute,20000
distress, 10000
coronavirus, 1,1111
China, 2,1111

result2.csv

0000,3
1111,3


2022-11-03 00:01

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.