How RMeCab Handles Files Written in UTF-8

Asked 2 years ago, Updated 2 years ago, 34 views

I have a question about statistics free software R.
I'm a beginner, so I might be stumbling in a very simple place.

Currently, I'm trying to analyze Japanese text statements saved in UTF-8 with RMeCab, but I'm trying to translate them into characters

I'm going to kick it.You are trying to use the RMeCabFreq() function.
So I searched many things on the Internet, but I couldn't find a solution.Please tell me the solution.

Tried
Attempt to change locale to "UTF-8" in Sys.setlocale() function
Result: OS report request to set locale to "UTF-8" is not accepted

options(encoding="UTF-8")
Various defects occur → If possible, please let me know the effect of specifying encoding in these options().

r

2022-09-30 19:03

2 Answers

Judging from the text, you are using MeCab, R, and RMeCab in Windows.
Both should assume Shift-JIS by default.
Therefore, converting the target file itself to Shift-JIS is the easiest way to use it.

If you cannot change the character code of the file for some reason, you can read it as UTF-8 and convert it to Shift-JIS inside R. If so, please ask me again.


2022-09-30 19:03

In your answer, I don't know when the characters got garbled, but if they are saved successfully as UTF-8, why don't you open them around TeraPad and change the character code and save them again?

Or

x<-readLines("utf-8.txt", encoding="UTF-8")
write(x, "shif.txt")

If you run , you should be able to change the character code.

Alternatively, use iconv in the middle of the operation as follows:

library(dplyr)
library(rvest)

usedCars<-read_html("http://www.goo-net.com/car_review/index.html")
comments<-html_nodes(usedCars, '.txt_review')%>%html_text()
comments<-iconv( comments, from = "UTF-8")

x<-tempfile()
write(x=gsub("[[:space:]]]", "", comments), file=x)

library (RMeCab)
frq<-RMeCabFreq(x)

head(frq)

unlink(x)

At the beginning, we are trying to extract elements from the site page, but if you have any questions related to these, we recommend that you ask them separately.


2022-09-30 19:03

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.