r - Prevent tm from removing stopwords from double words -


i'm trying remove stopwords vector of characters. problem i'm facing there word "king kond".since 'king' 1 of stopwords, "king" in "king kong" getting removed.

is there way avoid double words being removed? code is:

text <- vcorpus(vectorsource(newmnt1$form))  #(newmnt1$form  chr [1:4] "king kong lives" "foot" "island" "skull")  #normal standardization of text. text <- tm_map(text, content_transformer(tolower)) text <- tm_map(text, removewords, custom_stopwords) text <- tm_map(text, stripwhitespace) newmnt2 <- text[[1]]$content 

one quick hack convert "king kong" patterns "king_kong".

a <- gsub("king kong", "king_kong", "this pattern king , king kong") [1] "this pattern king , king_kong"  tm::removewords(a, "king") [1] "this pattern  , king_kong" 

best,

colin


Comments

Popular posts from this blog

Is there a better way to structure post methods in Class Based Views -

performance - Why is XCHG reg, reg a 3 micro-op instruction on modern Intel architectures? -

jquery - Responsive Navbar with Sub Navbar -