r - Find rows with a sequence of consecutive column values -
let's have data frame 1 below , need identify each row 1 or more missing values (na) followed @ least 1 valid value (any numerical). can me?
a <- c(1, 's06.4', 6.7, 7.0, 6.5, 7.0, 7.2, na, na, 6.6,6.7) b <- c(2 ,'s06.2' ,5.0, na, 4.9, 7.8, 9.3, 8.0, 7.8, 8.0,na) c <- c(3, 's06.5', 7.0, 5.5, na, na, 7.2, 8.0, 7.6, na,6.7) d <- c(4, 's06.5', 7.0, 7.0, 7.0, 6.9, 6.8, 9.0, 6.0, 6.6,6.7) e <- c(5, 's06.1', 6.7, na, na, na, na, na, na, na,na) df <- data.frame(rbind(a,b,c,d,e)) colnames(df) <- c('id','dx','dia01','dia02','dia03','dia04','dia05','dia06','dia07','dia08','dia09')
with:
df[rowsums(is.na(df[,3:10]) * !is.na(df[,4:11])) > 0,]
you get:
id dx dia01 dia02 dia03 dia04 dia05 dia06 dia07 dia08 dia09 1 s06.4 6.7 7 6.5 7 7.2 <na> <na> 6.6 6.7 b 2 s06.2 5 <na> 4.9 7.8 9.3 8 7.8 8 <na> c 3 s06.5 7 5.5 <na> <na> 7.2 8 7.6 <na> 6.7
what does:
is.na(df[,3:10])
check of values india01
dia08
columnsna
, returns logical matrix.!is.na(df[,4:11])
same next values in each row ofdf[,3:10]
, returns logical matrix- multiplying these 2 matrices gives logical matrix required condition.
- with
rowsums
check whether conditions met @ least once in each row.
in response comment: if want make sure na
followed numeric value, alter above solution to:
# first convert 'dia*''-columns numeric df[-c(1,2)] <- lapply(df[-c(1,2)], function(x) as.numeric(as.character(x))) # same because values can't converted numeric give na df[rowsums(is.na(df[,3:10]) * !is.na(df[,4:11])) > 0,]
or without convert numeric first:
df[rowsums(is.na(df[,3:10]) * !is.na(sapply(df[4:11], function(x) as.numeric(as.character(x))))) > 0,]
note:
with method used construct example data, end factor columns. of suppose don't want that.
a possibly correctly formatted example dataset be:
df <- structure(list(id = c("1", "2", "3", "4", "5"), dx = c("s06.4", "s06.2", "s06.5", "s06.5", "s06.1"), dia01 = c(6.7, 5, 7, 7, 6.7), dia02 = c(7, na, 5.5, 7, na), dia03 = c(6.5, 4.9, na, 7, na), dia04 = c(7, 7.8, na, 6.9, na), dia05 = c(7.2, 9.3, 7.2, 6.8, na), dia06 = c(na, 8, 8, 9, na), dia07 = c(na, 7.8, 7.6, 6, na), dia08 = c(6.6, 8, na, 6.6, na), dia09 = c(6.7, na, 6.7, 6.7, na)), .names = c("id", "dx", "dia01", "dia02", "dia03", "dia04", "dia05", "dia06", "dia07", "dia08", "dia09"), row.names = c("a", "b", "c", "d", "e"), class = "data.frame")
the proposed method works on well.
as noted @frank in comments, better store data in long format. with:
library(data.table) setdt(df)[, 3:11 := lapply(.sd, function(x) as.numeric(as.character(x))), .sdcols = 3:11][] melt(df, id = 1:2)[, if(any(is.na(value) & !is.na(shift(value, type = 'lead')))) .sd, = .(id, dx)]
you get:
id dx variable value 1: 1 s06.4 dia01 6.7 2: 1 s06.4 dia02 7.0 3: 1 s06.4 dia03 6.5 4: 1 s06.4 dia04 7.0 5: 1 s06.4 dia05 7.2 6: 1 s06.4 dia06 na 7: 1 s06.4 dia07 na 8: 1 s06.4 dia08 6.6 9: 1 s06.4 dia09 6.7 10: 2 s06.2 dia01 5.0 11: 2 s06.2 dia02 na 12: 2 s06.2 dia03 4.9 13: 2 s06.2 dia04 7.8 14: 2 s06.2 dia05 9.3 15: 2 s06.2 dia06 8.0 16: 2 s06.2 dia07 7.8 17: 2 s06.2 dia08 8.0 18: 2 s06.2 dia09 na 19: 3 s06.5 dia01 7.0 20: 3 s06.5 dia02 5.5 21: 3 s06.5 dia03 na 22: 3 s06.5 dia04 na 23: 3 s06.5 dia05 7.2 24: 3 s06.5 dia06 8.0 25: 3 s06.5 dia07 7.6 26: 3 s06.5 dia08 na 27: 3 s06.5 dia09 6.7
another alternative is:
setdt(df)[, 3:11 := lapply(.sd, function(x) as.numeric(as.character(x))), .sdcols = 3:11][] df[unique(melt(df, id = 1:2)[, .i[is.na(value) & !is.na(shift(value, type = 'lead'))], = .(id, dx)], = 'id')[,'id'], on = 'id']
the result of approach still in wide format presented in first part of answer.
Comments
Post a Comment