I have a question about statistics free software R.
I'm a beginner, so I might be stumbling in a very simple place.
Currently, I'm trying to analyze Japanese text statements saved in UTF-8 with RMeCab, but I'm trying to translate them into characters
I'm going to kick it.You are trying to use the RMeCabFreq() function.
So I searched many things on the Internet, but I couldn't find a solution.Please tell me the solution.
Tried
Attempt to change locale to "UTF-8" in Sys.setlocale() function
Result: OS report request to set locale to "UTF-8" is not accepted
options(encoding="UTF-8")
Various defects occur → If possible, please let me know the effect of specifying encoding in these options().
Judging from the text, you are using MeCab, R, and RMeCab in Windows.
Both should assume Shift-JIS by default.
Therefore, converting the target file itself to Shift-JIS is the easiest way to use it.
If you cannot change the character code of the file for some reason, you can read it as UTF-8 and convert it to Shift-JIS inside R. If so, please ask me again.
In your answer, I don't know when the characters got garbled, but if they are saved successfully as UTF-8, why don't you open them around TeraPad and change the character code and save them again?
Or
x<-readLines("utf-8.txt", encoding="UTF-8")
write(x, "shif.txt")
If you run , you should be able to change the character code.
Alternatively, use iconv in the middle of the operation as follows:
library(dplyr)
library(rvest)
usedCars<-read_html("http://www.goo-net.com/car_review/index.html")
comments<-html_nodes(usedCars, '.txt_review')%>%html_text()
comments<-iconv( comments, from = "UTF-8")
x<-tempfile()
write(x=gsub("[[:space:]]]", "", comments), file=x)
library (RMeCab)
frq<-RMeCabFreq(x)
head(frq)
unlink(x)
At the beginning, we are trying to extract elements from the site page, but if you have any questions related to these, we recommend that you ask them separately.
© 2024 OneMinuteCode. All rights reserved.