r - New columns based off existing column and column located next to it -
my dataframe looks this
id t1 obs1 t2 obs2 t3 obs3 1 0 11 d 0 g 2 0 b 13 e 11 3 0 c 0 f 0 h
i need make sure each id has @ least 1 t above 10 (delete row if not). then, want save lowest t value above 10, save corresponding obs in new columns. (the complicated part question lowest t above 10 in column). corresponding obs t located in next column, helps. resulting data frame this:
id t1 obs1 t2 obs2 t3 obs3 lowesttabove10 correspondingobs 1 0 11 d 0 g 11 d 2 0 b 13 e 11 11
with data.table, go long format:
library(data.table) setdt(dt) dat = melt(dt, measure.vars = patterns("^t\\d+$", "^obs\\d+$"), value.name = c("t", "obs")) setorder(dat, id, variable) # id variable t obs # 1: 1 1 0 # 2: 1 2 11 d # 3: 1 3 0 g # 4: 2 1 0 b # 5: 2 2 13 e # 6: 2 3 11 # 7: 3 1 0 c # 8: 3 2 0 f # 9: 3 3 0 h
find max value per group , mark groups keep:
iddt = dat[order(-t), .(max.variable = first(variable), max.t = first(t), max.obs = first(obs)) , by=id] iddt[, keep := max.t > 10] # id max.variable max.t max.obs keep # 1: 2 2 13 e true # 2: 1 2 11 d true # 3: 3 1 0 c false
find min value on 10 per kept group using rolling update join:
iddt[(keep), c("my.variable", "my.t", "my.obs") := { m = .(id = id, t_thresh = 10) dat[m, on=.(id, t = t_thresh), roll=-inf, .(x.variable, x.t, x.obs)] }] # id max.variable max.t max.obs keep my.variable my.t my.obs # 1: 2 2 13 e true 3 11 # 2: 1 2 11 d true 2 11 d # 3: 3 1 0 c false na na na
i stop here, main data in long format dat
, id
level variables in separate table iddt
. filter dat
groups should kept: dat[iddt[(keep), .(id)], on=.(id)]
. see ?data.table
, other intro materials mentioned when load package details on syntax.
see ?dcast
if insist on going wide.
Comments
Post a Comment