As the title suggests, I would like to use MeCab to remove Stopwords from the list in Python.
However, I get TypeError: in method 'Tagger_parse', argument 2 of type 'char const *'.
The environment is
Python 3.9.7
mecab-python3
Example Code:
import urlib
from urllib.request import urlopen
import MeCab
import re
# slothlib
slotlib_path="http://svn.sourceforge.jp/svnroot/slothlib/CSharp/Version1/SlothLib/NLP/Filter/StopWord/word/Japanese.txt"
slot_file=urlib.request.urlopen(slothlib_path)
# stopwordsiso
iso_path="https://raw.githubusercontent.com/stopwords-iso/stopwords-ja/master/stopwords-ja.txt"
iso_file=urllib.request.urlopen(iso_path)
stopwords = [ line.decode("utf-8").trip() for line iniso_file ]
stopwords = [ss for ss in stopwords if not ss == u' ]
stopwords=list(set(stopwords))
with open("/Desktop/cleaned-stp.txt", encoding='utf8') asf:
cleanedlist=f.readlines()
cleanedlist=list(cleanedlist)
tagger=MeCab.Tagger("-Owakati")
token_text=tagger.parse(cleanedlist)
ws = re.compile("")
words = [word for words in ws.split(tok_text)]
if words[-1]==u"\n":
words = words [:-1]
ws = [w for words if w not in stopwords ]
print(words)
print(ws)
Example List (.txt):
Done!My score is 100 How much do you know about magic?Let's do a test!You'll also get a chance to get a surprise reward like collaboration ride skin!wilderness magic test magic battle wilderness behavior
"""Girls' Wars: Fantasy Unification Battle"" is being pre-registered!" Participate in the reserved-only gacha and get SSR characters and items!43172 total revolutions!! Top 10 Advance Reservation Pre-Gacha Reservations
I will watch the magic round-the magic-play the magic round.
The 2nd Women's Cup officially hosted by Wilderness CUP!! Here's what to see! - A series of unique combinations! The most luxurious camp ever! - There are many nostalgic combinations that trace the history of Wilderness! - The battle to decide the last queen of the year begins!Distribution URL: Wilderness Behavior
I'm sorry that I asked a simple question as a beginner.
Thank you for your cooperation.
The reason is that you are passing a list of strings to a function that should pass a string.
Pass the string as shown in https://taku910.github.io/mecab/bindings.html.
with open("/Desktop/cleaned-stp.txt", encoding='utf8') asf:
cleaned_text=f.read()
tagger=MeCab.Tagger("-Owakati")
token_text=tagger.parse(cleaned_text)
I'm also a beginner, so I don't know if it will be helpful, but when I was using MeCab (just now), I got the same error and had a hard time, so I'll share it with you just in case!
In my case, I got this error when I was spacing columns of data frames.When I changed the type of the column from object to str, it worked well.I don't know the list, but I think the model is probably related.Good luck!!!
© 2024 OneMinuteCode. All rights reserved.