Environment: Python 3.5.2, MacOS Sierra
I read the data from Excel and made the string data.
[In]
import pandas as pd
import MeCab
import sys
df=pd.read_excel("filename.xls", sheetname=0)
df = df.dropna()
print(df)
Out
5 Security concerns
17 Convenient
24 Worry about security
28 I'm interested.
I don't have 63.
66 I think it's convenient, but I don't have it.
...
It may be 998 but I don't want to use it too much.
I think it's a thousand
"We use MeCab to extract only ""noun"" from this character data, and we consider the number of appearances as output with the following image."Could someone tell me how to write the code?
Number of character appearances
Security 154
Convenient 80
anxiety 45
Interest 20
...
Also, when I tried using the code below, I was able to read the data in Japanese using MeCab, so I don't think there is any problem with encoding in Japanese.
[In]
m=MeCab.Tagger("-Ochasen")
for i indf:
print(m.parse(i))
Out
Security Security Noun - General
Facial nouns - suffixes - general
in the odor of in the particle-case particle-conjunctive
Anxiety Juan Anxiety Noun - Adjective Verb Stem
(d) Postpositions - Adverbization
Feel Kanzil Feel Verbs - Self-reliant Basic Form
EOS
Did you use excel just to get the input statement? As shown below, we extracted the word from the mecab output string under the conditions of being a noun, put it on the list, and finally count the number of times it appears.Note that this answer is the default output format for mecab because it does not need to be printed in a special chasen format. If you specify -Ochasen
, the if
statement should be ifl!='EOS' and l.split('\t')[3][:2]=='n':
.
import collections
m = MeCab.Tagger()
no_list=[]# List of nouns containing duplicates
for i indf:
for lin m.parse(i).splitlines():
ifl!='EOS' and l.split('\t')[1].split(',')[0]=='noun': Extract nouns only except #EOS
no_list.append(l.split('\t')[0])# Add Headline
no_cnt=collections.Counter(noun_list)# Counting up each noun
for word, cnt in no_cnt.items():
print(word,cnt)
© 2024 OneMinuteCode. All rights reserved.