I'm setting up my environment to play with MeCab from Python, but in the parse part of the last line of code below
UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 0-1: invalid continuation byte
The error will occur.
import MeCab
tagger=MeCab.Tagger('Owakati')
tagger.parse('')
text= 'Natural language processing is fun'
result=tagger.parse(text)
Development environments include OS: Windows 7 32-bit, Python 3.6.0 (Anaconda 34.3.1), MeCab 0.996
mecab-python is installing the mecab-python-windows package pip by referring to here.
Tagger.parse('') is based on information that it is necessary not to get caught by Python GC.
Has anyone experienced a similar incident and has it been resolved?
python3 mecab
I solved myself.Sorry for the trouble.
When installing the Mecab body, the character code specification is not UTF-8, and I thought UTF-8 would work because I ran Recompile UTF-8 dictionary in the start menu later.However, the behavior was changed by reinstalling and setting the character code specification to UTF-8 during installation, and it worked fine.Apparently, "Recompile xxxx dictionary" is not valid.
Thank you for your reply, Kenjinoguchi.
© 2024 OneMinuteCode. All rights reserved.