Hi, everyone Through the what I asked earlier, I was able to correct the disused term deletion code and get the result of actually deleting the disused term.
But when I turned the code, I found that when I deleted the term, more than half of the data was lost.
I'll show you the chords.
import konlpy
import re
def tokenize_korean_text(text):
text = re.sub(r'[^,.?!\w\s]','', text)
okt = konlpy.tag.Okt()
Okt_morphs = okt.pos(text)
words = []
for word, pos in Okt_morphs:
if pos == 'Verb' or pos == 'Noun':
words.append(word)
return words
tokenized_list = []
for text in df['Keyword']:
tokenized_list.append(tokenize_korean_text(text))
print(len(tokenized_list))
print(tokenized_list[1800])
If you run it up to here, you'll see that the tokenized_list has 1800 lines of data. Below is the result.
1832
["Today", "National", "Environment", "Denial", "Status", "Improvement", "Members", "Each", "Environment", "Profit", "Daehan"]
Continue executing the code.
drop_corpus = []
for index in range(len(tokenized_list)):
corpus = tokenized_list[index]
if len(set(corpus)) < 3:
review_df.drop(index, axis='index', inplace=True)
drop_corpus.append(corpus)
for corpus in drop_corpus:
tokenized_list.remove(corpus)
df.reset_index(drop=True, inplace=True)
Right below is the code for the deletion of the term.
stopwords = [It's "It's", "We", "We", "Hal", "Su", "We", "We", "We", "Everyone"]
clean_words = []
for i in tokenized_list:
a = 0
for ii in stopwords:
if ii in i:
a += 1
if a < 1:
clean_words.append(i)
If print (clean_words[991]) is executed here,
IndexError: list index out of range
The 991st data says no, and even when you actually save the result, only the 990th data is stored. The rest of the data seems to have been lost.
How do I modify the code?
jupyter-notebook konlpy text-mining
# What you are doing and what you are explaining
a = [
[1,23,34],
[2,3,4,5]
]
b = [2]
c = []
for i in a:
z = 0
for ii in b:
if ii in i:
z += 1
if z < 1:
c.append(i)
# Maybe what you want
a = [
[1,23,34],
[2,3,4,5]
]
b = [2]
c = []
for i in a:
for ii in b:
if ii in i:
i.remove(ii)
c.append(i)
© 2024 OneMinuteCode. All rights reserved.