The mecab spacing is too slow to finish

Asked 2 years ago, Updated 2 years ago, 133 views

I call documents from the tsv file and use MeCab to move the code that uses the basic form of the part of speech to separate them, but it's too heavy to finish even after more than 5 hours.
The specifications of the PC itself are not that good, but it didn't happen when I was running other cords, so I'd like to ask you if there's something wrong with the code.

with open("jurycomment2.tsv", mode='r', encoding='utf-8') asf:
    # Reports.tsv contains word-of-mouth IDs and word-of-mouth separated by tab.
    reader=csv.reader(f, delimiter="\t")
    for report_id, report in reader:
        words = [ ]
        node=mt.parseToNode(report)
        while node:
                if node.feature.split(",")[0]==u "noun":
                        words.append(node.surface)
                elif node.feature.split(",")[0]==u "adjective":
                        words.append(node.feature.split(",")[6])
                elif node.feature.split(",")[0]==u "verb":
                        words.append(node.feature.split(",")[6])
                        node = node.next
        stopword = [ ]
        words2 = [token for token in words if token not in stopword ]
        # Words is a list of words in a sentence, tags is a sentence ID.
        reports.append (TaggedDocument(words=words2, tags=[report_id]))

The tsv file is about 500kb, so I don't think this file is the cause.

python natural-language-processing

2022-09-30 20:11

1 Answers

The indentation height of node=node.next is elif node.feature.split(",")[0]==u "verb": Since it is inside the sentence, it may be almost an infinite loop.


2022-09-30 20:11

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.