r - Long to wide on multiple columns by data.table -
i have question data.table's melt , dcast multiple columns. browsed on stackoverflow many similar posts not want. explain below.
first, data causes of problems , value amount. here part of data:
id type problem1 value1 problem2 value2 problem3 value3 1 x 500 y 1000 z 400 2 x 600 z 700 3 b y 700 z 100 4 b w 200 v 200 5 c z 500 v 500 6 c x 1000 w 100 v 900 second, id unique. type contains three(a, b, , c). there 5 problems.
take id == 1 example. type a , contains 3 problems(x, y, , z). problem x has value 500, problem y has value 1000, problem z has value 400. take id == 5 example. type c , contains 2 problems(z , v). problem z has value 500 , problem v has value 500.
third, column id, type, problem1, problem2, , problem3 character. value1, value2, , value3 numeric.
the result want is:
type x y z w v 1100 1000 1100 0 0 b 0 700 100 200 200 c 1000 0 500 100 1400 i don't know how explain here properly. want group type , summation each problem's vlaue. think long wide. found reference here , here. second 1 may useful. however, have no idea begin. suggestions?
# data dt <- fread(" id type problem1 value1 problem2 value2 problem3 value3 1 x 500 y 1000 z 400 2 x 600 z 700 3 b y 700 z 100 4 b w 200 v 200 5 c z 500 v 500 6 c x 1000 w 100 v 900", fill = t)
we can first melt specifying patterns in measure 'long' format , dcast fun.aggregate sum
dcast(melt(dt, measure = patterns("^value", "^problem"), value.name = c("value", "problem"))[problem != "" ][, problem := factor(problem, levels = c("x", "y", "z", "w", "v"))], type ~problem, value.var = "value", sum, na.rm = true) # type x y z w v #1: 1100 1000 1100 0 0 #2: b 0 700 100 200 200 #3: c 1000 0 500 100 1400 melt data.table can take multiple patterns in measure argument. so, when "^value" matches columns have names start (^) "value" , "problem" , create 2 'value' columns. in above, naming columns 'value' , 'problem' value.name argument. dataset having blanks, long format have blank elements remove problem != "". next step important if need have columns in specific order. so, change 'problem' factor class , specified levels in order. now, melt part completed. long format changed 'wide' dcast specifying formula, value.var column , fun.aggregate (here sum)
Comments
Post a Comment