(Correction) The problem of data disappearing when code is executed to delete the term

Hi, everyone Through the what I asked earlier, I was able to correct the disused term deletion code and get the result of actually deleting the disused term.

But when I turned the code, I found that when I deleted the term, more than half of the data was lost.

I'll show you the chords.

import konlpy
import re

def tokenize_korean_text(text):
    text = re.sub(r'[^,.?!\w\s]','', text)

    okt = konlpy.tag.Okt()
    Okt_morphs = okt.pos(text)

    words = []
    for word, pos in Okt_morphs:
        if pos == 'Verb' or pos == 'Noun':
            words.append(word)

    return words


tokenized_list = []

for text in df['Keyword']:
    tokenized_list.append(tokenize_korean_text(text))

print(len(tokenized_list))
print(tokenized_list[1800])

If you run it up to here, you'll see that the tokenized_list has 1800 lines of data. Below is the result.

1832
["Today", "National", "Environment", "Denial", "Status", "Improvement", "Members", "Each", "Environment", "Profit", "Daehan"]

Continue executing the code.

drop_corpus = []

for index in range(len(tokenized_list)):
    corpus = tokenized_list[index]
    if len(set(corpus)) < 3:  
        review_df.drop(index, axis='index', inplace=True)
        drop_corpus.append(corpus)

for corpus in drop_corpus:
    tokenized_list.remove(corpus)

df.reset_index(drop=True, inplace=True)

Right below is the code for the deletion of the term.

stopwords = [It's "It's", "We", "We", "Hal", "Su", "We", "We", "We", "Everyone"]

clean_words = []
for i in tokenized_list:
    a = 0
    for ii in stopwords:
        if ii in i:
            a += 1
    if a < 1:
        clean_words.append(i)

If print (clean_words[991]) is executed here,

IndexError: list index out of range

The 991st data says no, and even when you actually save the result, only the 990th data is stored. The rest of the data seems to have been lost.

How do I modify the code?

jupyter-notebook konlpy text-mining

2022-09-20 08:56

1 Answers

# What you are doing and what you are explaining
a = [
[1,23,34],
[2,3,4,5]
]
b = [2]
c = []
for i in a:
    z = 0
    for ii in b:
        if ii in i:
            z += 1
    if z < 1:
        c.append(i)

# Maybe what you want
a = [
[1,23,34],
[2,3,4,5]
]
b = [2]
c = []
for i in a:
    for ii in b:
        if ii in i:
            i.remove(ii)
    c.append(i)

2022-09-20 08:56

If you have any answers or tips

Popular Tags

python x 4647

android x 1593

java x 1494

javascript x 1427

c x 927

c++ x 878

ruby-on-rails x 696

php x 692

python3 x 685

html x 656