r - Find rows with a sequence of consecutive column values -


let's have data frame 1 below , need identify each row 1 or more missing values (na) followed @ least 1 valid value (any numerical). can me?

a <- c(1, 's06.4', 6.7, 7.0, 6.5, 7.0, 7.2, na, na, 6.6,6.7)  b <- c(2 ,'s06.2' ,5.0, na, 4.9, 7.8, 9.3, 8.0, 7.8, 8.0,na) c <- c(3, 's06.5', 7.0, 5.5, na, na, 7.2, 8.0, 7.6, na,6.7)  d <- c(4, 's06.5', 7.0, 7.0, 7.0, 6.9, 6.8, 9.0, 6.0, 6.6,6.7)  e <- c(5, 's06.1', 6.7, na, na, na, na, na, na, na,na)   df <- data.frame(rbind(a,b,c,d,e)) colnames(df) <- c('id','dx','dia01','dia02','dia03','dia04','dia05','dia06','dia07','dia08','dia09') 

with:

df[rowsums(is.na(df[,3:10]) * !is.na(df[,4:11])) > 0,] 

you get:

  id    dx dia01 dia02 dia03 dia04 dia05 dia06 dia07 dia08 dia09  1 s06.4   6.7     7   6.5     7   7.2  <na>  <na>   6.6   6.7 b  2 s06.2     5  <na>   4.9   7.8   9.3     8   7.8     8  <na> c  3 s06.5     7   5.5  <na>  <na>   7.2     8   7.6  <na>   6.7 

what does:

  • is.na(df[,3:10]) check of values in dia01 dia08 columns na , returns logical matrix.
  • !is.na(df[,4:11]) same next values in each row of df[,3:10] , returns logical matrix
  • multiplying these 2 matrices gives logical matrix required condition.
  • with rowsums check whether conditions met @ least once in each row.

in response comment: if want make sure na followed numeric value, alter above solution to:

# first convert 'dia*''-columns numeric df[-c(1,2)] <- lapply(df[-c(1,2)], function(x) as.numeric(as.character(x))) # same because values can't converted numeric give na df[rowsums(is.na(df[,3:10]) * !is.na(df[,4:11])) > 0,] 

or without convert numeric first:

df[rowsums(is.na(df[,3:10]) * !is.na(sapply(df[4:11], function(x) as.numeric(as.character(x))))) > 0,] 

note:

with method used construct example data, end factor columns. of suppose don't want that.

a possibly correctly formatted example dataset be:

df <- structure(list(id = c("1", "2", "3", "4", "5"),                       dx = c("s06.4", "s06.2", "s06.5", "s06.5", "s06.1"),                       dia01 = c(6.7, 5, 7, 7, 6.7),                      dia02 = c(7, na, 5.5, 7, na),                       dia03 = c(6.5, 4.9, na, 7, na),                      dia04 = c(7, 7.8, na, 6.9, na),                      dia05 = c(7.2, 9.3, 7.2, 6.8, na),                      dia06 = c(na, 8, 8, 9, na),                      dia07 = c(na, 7.8, 7.6, 6, na),                      dia08 = c(6.6, 8, na, 6.6, na),                      dia09 = c(6.7, na, 6.7, 6.7, na)),                 .names = c("id", "dx", "dia01", "dia02", "dia03", "dia04", "dia05", "dia06", "dia07", "dia08", "dia09"),                 row.names = c("a", "b", "c", "d", "e"),                 class = "data.frame") 

the proposed method works on well.



as noted @frank in comments, better store data in long format. with:

library(data.table) setdt(df)[, 3:11 := lapply(.sd, function(x) as.numeric(as.character(x))), .sdcols = 3:11][] melt(df, id = 1:2)[, if(any(is.na(value) & !is.na(shift(value, type = 'lead')))) .sd, = .(id, dx)] 

you get:

    id    dx variable value  1:  1 s06.4    dia01   6.7  2:  1 s06.4    dia02   7.0  3:  1 s06.4    dia03   6.5  4:  1 s06.4    dia04   7.0  5:  1 s06.4    dia05   7.2  6:  1 s06.4    dia06    na  7:  1 s06.4    dia07    na  8:  1 s06.4    dia08   6.6  9:  1 s06.4    dia09   6.7 10:  2 s06.2    dia01   5.0 11:  2 s06.2    dia02    na 12:  2 s06.2    dia03   4.9 13:  2 s06.2    dia04   7.8 14:  2 s06.2    dia05   9.3 15:  2 s06.2    dia06   8.0 16:  2 s06.2    dia07   7.8 17:  2 s06.2    dia08   8.0 18:  2 s06.2    dia09    na 19:  3 s06.5    dia01   7.0 20:  3 s06.5    dia02   5.5 21:  3 s06.5    dia03    na 22:  3 s06.5    dia04    na 23:  3 s06.5    dia05   7.2 24:  3 s06.5    dia06   8.0 25:  3 s06.5    dia07   7.6 26:  3 s06.5    dia08    na 27:  3 s06.5    dia09   6.7 

another alternative is:

setdt(df)[, 3:11 := lapply(.sd, function(x) as.numeric(as.character(x))), .sdcols = 3:11][] df[unique(melt(df, id = 1:2)[, .i[is.na(value) & !is.na(shift(value, type = 'lead'))], = .(id, dx)], = 'id')[,'id'], on = 'id'] 

the result of approach still in wide format presented in first part of answer.


Comments

Popular posts from this blog

Is there a better way to structure post methods in Class Based Views -

performance - Why is XCHG reg, reg a 3 micro-op instruction on modern Intel architectures? -

jquery - Responsive Navbar with Sub Navbar -