r - Prevent tm from removing stopwords from double words -
i'm trying remove stopwords vector of characters. problem i'm facing there word "king kond".since 'king' 1 of stopwords, "king" in "king kong" getting removed.
is there way avoid double words being removed? code is:
text <- vcorpus(vectorsource(newmnt1$form)) #(newmnt1$form chr [1:4] "king kong lives" "foot" "island" "skull") #normal standardization of text. text <- tm_map(text, content_transformer(tolower)) text <- tm_map(text, removewords, custom_stopwords) text <- tm_map(text, stripwhitespace) newmnt2 <- text[[1]]$content
one quick hack convert "king kong" patterns "king_kong".
a <- gsub("king kong", "king_kong", "this pattern king , king kong") [1] "this pattern king , king_kong" tm::removewords(a, "king") [1] "this pattern , king_kong"
best,
colin
Comments
Post a Comment