Could you tell me how to remove the punctuation (,) and the variable connection ("') in the sentence with the morpheme analysis using RMeCab?I tried the code below, but I couldn't erase it.
data_clean<-gsub('.', "", data)
data_clean<-gsub(', ', ", data)
data_clean<-gsub(',',', ''', data)
data_clean<-gsub('', '', data)
data_clean<-gsub('', '', data)
Also, there are two types of , full-width, half-width, and half-width, which may have caused the omission (where R is the difference between full-width and half-width
.
You can include half-width '
as \'
and "
as \"
in the pattern.gsub()
can take multiple patterns as arguments using |
.
#data Example (Full-width and half-width versions of each symbol are sandwiched from the left, and \ is included for convenience with "")
x<- "Oh, yes, yes, yes, yes, yes, yes, yes, yes, yes, yes, yes, yes, yes, yes, yes, yes, yes, yes, yes, yes, yes."
# US>Both full-width, half-width. Specify ' ' as the pattern and remove it.
gsub( pattern = 。 .|。|、|、|"|\"|'|\'", replacement="",x)#[1] "Aye-ye-ye-ye-ye-ye-ye-ye-ye-ye-ye-ye-ye-ye-ye"
# Pattern match using regular expression (Dear @MichaelChirico, thank you for your comment)
gsub(pattern="[., , "\"\']", replacement="", x )
library(stringr)
str_replace_all(x, pattern=".|。|、|、|"|\"|'|\'", replacement="")#Same as above
a<-str_extract_all(x, pattern="\\p {Hiragana} |\\p {Katakana} |\\p {Han}")
paste(a[[1]], collapse="")# Processed by extraction instead of deletion
© 2024 OneMinuteCode. All rights reserved.