How to Remove Punctuation and Subvariant Connections in gsub

Asked 2 years ago, Updated 2 years ago, 32 views

Could you tell me how to remove the punctuation (,) and the variable connection ("') in the sentence with the morpheme analysis using RMeCab?I tried the code below, but I couldn't erase it.

data_clean<-gsub('.', "", data)
data_clean<-gsub(', ', ", data)
data_clean<-gsub(',',', ''', data)
data_clean<-gsub('', '', data)
data_clean<-gsub('', '', data)

r

2022-09-30 10:50

1 Answers

Also, there are two types of , full-width, half-width, and half-width, which may have caused the omission (where R is the difference between full-width and half-width .You can include half-width ' as \' and " as \" in the pattern.gsub() can take multiple patterns as arguments using |.

#data Example (Full-width and half-width versions of each symbol are sandwiched from the left, and \ is included for convenience with "")
x<- "Oh, yes, yes, yes, yes, yes, yes, yes, yes, yes, yes, yes, yes, yes, yes, yes, yes, yes, yes, yes, yes, yes."

  # US>Both full-width, half-width. Specify ' ' as the pattern and remove it.
gsub( pattern = 。 .|。|、|、|"|\"|'|\'", replacement="",x)#[1] "Aye-ye-ye-ye-ye-ye-ye-ye-ye-ye-ye-ye-ye-ye-ye"

    # Pattern match using regular expression (Dear @MichaelChirico, thank you for your comment)
   gsub(pattern="[., , "\"\']", replacement="", x )

library(stringr)
str_replace_all(x, pattern=".|。|、|、|"|\"|'|\'", replacement="")#Same as above

a<-str_extract_all(x, pattern="\\p {Hiragana} |\\p {Katakana} |\\p {Han}")
paste(a[[1]], collapse="")# Processed by extraction instead of deletion


2022-09-30 10:50

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.