Python MeCab binding

Asked 2 years ago, Updated 2 years ago, 92 views

I'm setting up my environment to play with MeCab from Python, but in the parse part of the last line of code below
UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 0-1: invalid continuation byte
The error will occur.

import MeCab

tagger=MeCab.Tagger('Owakati')
tagger.parse('')

text= 'Natural language processing is fun'
result=tagger.parse(text)

Development environments include OS: Windows 7 32-bit, Python 3.6.0 (Anaconda 34.3.1), MeCab 0.996 mecab-python is installing the mecab-python-windows package pip by referring to here.
Tagger.parse('') is based on information that it is necessary not to get caught by Python GC.

Has anyone experienced a similar incident and has it been resolved?

python3 mecab

2022-09-30 19:43

1 Answers

I solved myself.Sorry for the trouble.

When installing the Mecab body, the character code specification is not UTF-8, and I thought UTF-8 would work because I ran Recompile UTF-8 dictionary in the start menu later.However, the behavior was changed by reinstalling and setting the character code specification to UTF-8 during installation, and it worked fine.Apparently, "Recompile xxxx dictionary" is not valid.

Thank you for your reply, Kenjinoguchi.


2022-09-30 19:43

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.