regex - How do I filter "CUSTOM" out of a string but not "CUSTOMER" in R? Grepl? -
i'm trying filter out custom items data frame in r using item descriptions. want rid of items "custom" in description, need keep items "customer" in description. tried using grepl function no avail. i've got 800,000+ rows of data, speedy helpful. 1 filter out of many, using dplyr , pipe operators other filters.
generic code:
> items <- c("a", "b", "c") > desc <- c("custom stamp", "customer 4x6 in stamp", "4x6 generic stamp") > df <- data.frame(items = items, item_desc = desc) > df items item_desc 1 custom stamp 2 b customer 4x6 in stamp 3 c 4x6 generic stamp i've tried this:
library(dplyr) df <- df %>% filter(!grepl("custom", item_desc, fixed = true)) but obviously, result is:
> df items item_desc 1 c 4x6 generic stamp whereas desired result be:
> df items item_desc 1 b customer 4x6 in stamp 2 c 4x6 generic stamp thanks!
you need use regular expression here utilizes word boundaries, "\\bcustom\\b".
to make work, need remove fixed=true argument makes engine treat pattern literal string, not pattern.
use
df <- df %>% filter(!grepl("\\bcustom\\b", item_desc)) see what pattern matches. items not match remain in df because result of grepl inverted ! operator.
Comments
Post a Comment