How do I get the disused words out of the csv?

val['Riviews'] = val['Riviews'].str.replace("[^--가-히---he-he-he-he-")
val['Riviews'].replace('', np.nan, inplace=True)
val.head()

from konlpy.tag import Okt
okt = Okt()

I have to extract the disused term from here, and the review in the df is Riviews, but I want to find the disused term in here and remove it.

train['tokenized'] = train['Riviews'].apply(okt.morphs)
train['tokenized'] = train['tokenized'].apply(lambda x: [item for item in x if item not in stopwords])

python nlp

2022-09-20 11:33

1 Answers

I don't think we can extract terms from the data. I think I just have a survey that I already know, a pronoun that comes out too often, and so on. In the previous question, stop_words are just hardcoded. If there's any additional non-intercepts in the data, add them back and... It looks like this.

2022-09-20 11:33

If you have any answers or tips

Popular Tags

python x 4647

android x 1593

java x 1494

javascript x 1427

c x 927

c++ x 878

ruby-on-rails x 696

php x 692

python3 x 685

html x 656