r - Long to wide on multiple columns by data.table -


i have question data.table's melt , dcast multiple columns. browsed on stackoverflow many similar posts not want. explain below.

first, data causes of problems , value amount. here part of data:

id   type    problem1    value1     problem2    value2    problem3    value3 1          x           500        y           1000      z           400 2          x           600        z           700        3    b       y           700        z           100 4    b       w           200        v           200 5    c       z           500        v           500        6    c       x           1000       w           100       v           900 

second, id unique. type contains three(a, b, , c). there 5 problems.

take id == 1 example. type a , contains 3 problems(x, y, , z). problem x has value 500, problem y has value 1000, problem z has value 400. take id == 5 example. type c , contains 2 problems(z , v). problem z has value 500 , problem v has value 500.

third, column id, type, problem1, problem2, , problem3 character. value1, value2, , value3 numeric.

the result want is:

type    x     y     z     w     v       1100  1000  1100  0     0    b       0     700   100   200   200 c       1000  0     500   100   1400   

i don't know how explain here properly. want group type , summation each problem's vlaue. think long wide. found reference here , here. second 1 may useful. however, have no idea begin. suggestions?

# data dt <- fread(" id   type    problem1    value1     problem2    value2    problem3    value3 1          x           500        y           1000      z           400 2          x           600        z           700        3    b       y           700        z           100 4    b       w           200        v           200 5    c       z           500        v           500        6    c       x           1000       w           100       v           900", fill = t)     

we can first melt specifying patterns in measure 'long' format , dcast fun.aggregate sum

dcast(melt(dt, measure = patterns("^value", "^problem"),      value.name = c("value", "problem"))[problem != ""      ][, problem := factor(problem, levels = c("x", "y", "z", "w", "v"))],       type ~problem, value.var = "value", sum, na.rm = true) #   type    x    y    z   w    v #1:    1100 1000 1100   0    0 #2:    b    0  700  100 200  200 #3:    c 1000    0  500 100 1400 

melt data.table can take multiple patterns in measure argument. so, when "^value" matches columns have names start (^) "value" , "problem" , create 2 'value' columns. in above, naming columns 'value' , 'problem' value.name argument. dataset having blanks, long format have blank elements remove problem != "". next step important if need have columns in specific order. so, change 'problem' factor class , specified levels in order. now, melt part completed. long format changed 'wide' dcast specifying formula, value.var column , fun.aggregate (here sum)


Comments

Popular posts from this blog

Is there a better way to structure post methods in Class Based Views -

performance - Why is XCHG reg, reg a 3 micro-op instruction on modern Intel architectures? -

c# - Asp.net web api : redirect unauthorized requst to forbidden page -