How to calculate the number of nouns appearing in Japanese strings in Python 3-MeCab

Asked 2 years ago, Updated 2 years ago, 106 views

Environment: Python 3.5.2, MacOS Sierra

I read the data from Excel and made the string data.

[In]

import pandas as pd
import MeCab
import sys

df=pd.read_excel("filename.xls", sheetname=0)
df = df.dropna()
print(df)  

Out

5 Security concerns
17 Convenient
24 Worry about security
28 I'm interested.
I don't have 63.
66 I think it's convenient, but I don't have it.
...
It may be 998 but I don't want to use it too much.
I think it's a thousand

"We use MeCab to extract only ""noun"" from this character data, and we consider the number of appearances as output with the following image."Could someone tell me how to write the code?

Number of character appearances
Security 154
Convenient 80
anxiety 45
Interest 20
...

Also, when I tried using the code below, I was able to read the data in Japanese using MeCab, so I don't think there is any problem with encoding in Japanese.

[In]

m=MeCab.Tagger("-Ochasen")
for i indf:
    print(m.parse(i)) 

Out

Security Security Noun - General       
Facial nouns - suffixes - general        
in the odor of in the particle-case particle-conjunctive       
Anxiety Juan Anxiety Noun - Adjective Verb Stem       
(d) Postpositions - Adverbization      
Feel Kanzil Feel Verbs - Self-reliant Basic Form
EOS

python mecab

2022-09-30 19:17

1 Answers

Did you use excel just to get the input statement? As shown below, we extracted the word from the mecab output string under the conditions of being a noun, put it on the list, and finally count the number of times it appears.Note that this answer is the default output format for mecab because it does not need to be printed in a special chasen format. If you specify -Ochasen, the if statement should be ifl!='EOS' and l.split('\t')[3][:2]=='n':.

import collections
m = MeCab.Tagger()

no_list=[]# List of nouns containing duplicates
for i indf:
    for lin m.parse(i).splitlines():
        ifl!='EOS' and l.split('\t')[1].split(',')[0]=='noun': Extract nouns only except #EOS
            no_list.append(l.split('\t')[0])# Add Headline

no_cnt=collections.Counter(noun_list)# Counting up each noun

for word, cnt in no_cnt.items():
    print(word,cnt)


2022-09-30 19:17

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.