How do I get the disused words out of the csv?

Asked 2 years ago, Updated 2 years ago, 53 views

val['Riviews'] = val['Riviews'].str.replace("[^--가-히---he-he-he-he-")
val['Riviews'].replace('', np.nan, inplace=True)
val.head()

from konlpy.tag import Okt
okt = Okt()

I have to extract the disused term from here, and the review in the df is Riviews, but I want to find the disused term in here and remove it.

train['tokenized'] = train['Riviews'].apply(okt.morphs)
train['tokenized'] = train['tokenized'].apply(lambda x: [item for item in x if item not in stopwords])

python nlp

2022-09-20 11:33

1 Answers

I don't think we can extract terms from the data. I think I just have a survey that I already know, a pronoun that comes out too often, and so on. In the previous question, stop_words are just hardcoded. If there's any additional non-intercepts in the data, add them back and... It looks like this.


2022-09-20 11:33

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.