regex - How to remove middle appearing twice in the name list in r -
my name list has following error middle name appears twice example s.no-1,2. have data table format has 100k observation , 15 variables including name column. how achieve expected output removing middle name appearing twice?
name column expected 1.a michael michael aura 1.a michael aura 2.a thomas thomas parsa 2.a thomas parsa 3.a gul 3.a gul 4.clark 4.clark
we can use sub
sub("\\s+(\\w+\\s*)\\1+", " \\1", df1[,1]) #[1] "1.a michael aura" "2.a thomas parsa" "3.a gul" "4.clark"
Comments
Post a Comment