r - Sum of a non-uniform subset -
in project, have bunch of information bus company. separate subset date, see required bus lines (which in "linha" column) barplot.
->e.g. of subset:
data.date[[1]] equivalent of subset of rows have date "2013-03-10".
for accomplished, tried sum values in dim "catraca"(ticket gate) in vector different "linhas" (bus lines). and, i'm struggling hard.
this logic used
linha.sum <- with(data.date[[1]], data.date[[1]] == linha.unique, sum(data.date[[1]]$catraca))
the output logical vector. not desired.
this pictures might understand situation
view(data.date[[1]])
the values want sum "catraca" of different "linha"
sample of data:
data.dates <- list(read.table(text = "linha dsaida hsaida dchegada hchegada sentido catraca embarcado 3 2016-01-01 04:05 2016-01-01 04:15 0 0 0 3 2016-01-01 04:23 2016-01-01 23:57 0 37 0 3 2016-01-01 04:05 2016-01-01 04:15 0 0 0 3 2016-01-01 04:22 2016-01-01 23:58 0 83 0 3 2016-01-01 04:04 2016-01-01 04:15 0 0 0 3 2016-01-01 04:23 2016-01-01 23:58 0 43 0 6 2016-01-01 03:49 2016-01-01 13:41 0 82 0 6 2016-01-01 13:43 2016-01-01 23:09 0 98 0 7 2016-01-01 03:54 2016-01-01 14:49 0 61 0 7 2016-01-01 14:51 2016-01-01 23:10 0 46 0", header = t))
since data.dates
seems list of data.frames (probably created split()
), sums of column within each of these data sets can acquired lapply
.
here reproducible data:
data.dates <- list(data.frame( linha = c(3,3,1201,1201), catraca = c(0,37,2,22) ))
with dplyr
library(dplyr) lapply(data.dates, function(x) { x %>% group_by(linha) %>% summarize(catsum = sum(catraca)) }) # [[1]] # # tibble: 2 x 2 # linha catsum # <dbl> <dbl> # 1 3 37 # 2 1201 24
this add column each data.frame within list containing sum per each group (by date , linha)
with base r
from @sagars comment use aggregate
in lapply
:
lapply(data.dates, function(x) { aggregate(x$catraca, = list(linha = x$linha), fun = sum) }) # [[1]] # linha x # 1 3 37 # 2 1201 24
benchmarking
in fact, microbenchmark()
reveals, base solution (as often) faster in case. however, tested small subset given in op.
# unit: microseconds # expr min lq mean median uq max neval cld # dplyr 1803.553 1878.499 1994.4945 1918.8880 2016.8730 6495.747 100 b # base 481.535 513.818 543.4041 538.1365 560.4635 803.222 100
Comments
Post a Comment