r - Long to wide on multiple columns by data.table -
i have question data.table
's melt
, dcast
multiple columns. browsed on stackoverflow many similar posts not want. explain below.
first, data
causes of problems , value amount. here part of data
:
id type problem1 value1 problem2 value2 problem3 value3 1 x 500 y 1000 z 400 2 x 600 z 700 3 b y 700 z 100 4 b w 200 v 200 5 c z 500 v 500 6 c x 1000 w 100 v 900
second, id
unique. type
contains three(a
, b
, , c
). there 5 problems.
take id == 1
example. type a
, contains 3 problems(x
, y
, , z
). problem x
has value 500
, problem y
has value 1000
, problem z
has value 400
. take id == 5
example. type c
, contains 2 problems(z
, v
). problem z
has value 500
, problem v
has value 500
.
third, column id
, type
, problem1
, problem2
, , problem3
character
. value1
, value2
, , value3
numeric
.
the result want is:
type x y z w v 1100 1000 1100 0 0 b 0 700 100 200 200 c 1000 0 500 100 1400
i don't know how explain here properly. want group type
, summation each problem's vlaue. think long wide. found reference here , here. second 1 may useful. however, have no idea begin. suggestions?
# data dt <- fread(" id type problem1 value1 problem2 value2 problem3 value3 1 x 500 y 1000 z 400 2 x 600 z 700 3 b y 700 z 100 4 b w 200 v 200 5 c z 500 v 500 6 c x 1000 w 100 v 900", fill = t)
we can first melt
specifying patterns
in measure
'long' format , dcast
fun.aggregate
sum
dcast(melt(dt, measure = patterns("^value", "^problem"), value.name = c("value", "problem"))[problem != "" ][, problem := factor(problem, levels = c("x", "y", "z", "w", "v"))], type ~problem, value.var = "value", sum, na.rm = true) # type x y z w v #1: 1100 1000 1100 0 0 #2: b 0 700 100 200 200 #3: c 1000 0 500 100 1400
melt
data.table
can take multiple patterns
in measure
argument. so, when "^value"
matches columns have names start (^
) "value" , "problem" , create 2 'value' columns. in above, naming columns 'value' , 'problem' value.name
argument. dataset having blanks, long format have blank elements remove problem != ""
. next step important if need have columns in specific order. so, change 'problem' factor
class , specified levels
in order. now, melt
part completed. long format changed 'wide' dcast
specifying formula, value.var
column , fun.aggregate
(here sum
)
Comments
Post a Comment