r - Sum of a non-uniform subset -


in project, have bunch of information bus company. separate subset date, see required bus lines (which in "linha" column) barplot.

->e.g. of subset:

data.date[[1]] equivalent of subset of rows have date "2013-03-10".

for accomplished, tried sum values in dim "catraca"(ticket gate) in vector different "linhas" (bus lines). and, i'm struggling hard.

this logic used

linha.sum <- with(data.date[[1]], data.date[[1]] == linha.unique, sum(data.date[[1]]$catraca)) 

the output logical vector. not desired.

this pictures might understand situation

 view(data.date[[1]]) 

picture of sample

the values want sum "catraca" of different "linha"

sample of data:

data.dates <- list(read.table(text = "linha     dsaida hsaida   dchegada hchegada sentido catraca embarcado                                           3 2016-01-01  04:05 2016-01-01    04:15       0       0         0                                           3 2016-01-01  04:23 2016-01-01    23:57       0      37         0                                           3 2016-01-01  04:05 2016-01-01    04:15       0       0         0                                           3 2016-01-01  04:22 2016-01-01    23:58       0      83         0                                           3 2016-01-01  04:04 2016-01-01    04:15       0       0         0                                           3 2016-01-01  04:23 2016-01-01    23:58       0      43         0                                           6 2016-01-01  03:49 2016-01-01    13:41       0      82         0                                           6 2016-01-01  13:43 2016-01-01    23:09       0      98         0                                           7 2016-01-01  03:54 2016-01-01    14:49       0      61         0                                           7 2016-01-01  14:51 2016-01-01    23:10       0      46         0", header = t)) 

since data.dates seems list of data.frames (probably created split()), sums of column within each of these data sets can acquired lapply.

here reproducible data:

data.dates <- list(data.frame(   linha = c(3,3,1201,1201),    catraca = c(0,37,2,22) )) 

with dplyr

library(dplyr) lapply(data.dates, function(x) {          x %>% group_by(linha) %>% summarize(catsum = sum(catraca)) }) # [[1]] # # tibble: 2 x 2 #    linha         catsum #    <dbl>          <dbl> # 1     3             37 # 2  1201             24 

this add column each data.frame within list containing sum per each group (by date , linha)

with base r

from @sagars comment use aggregate in lapply:

lapply(data.dates, function(x) {   aggregate(x$catraca, = list(linha = x$linha), fun = sum) }) # [[1]] #   linha  x # 1     3 37 # 2  1201 24 

benchmarking

in fact, microbenchmark() reveals, base solution (as often) faster in case. however, tested small subset given in op.

# unit: microseconds #   expr      min       lq      mean    median        uq      max neval cld #  dplyr 1803.553 1878.499 1994.4945 1918.8880 2016.8730 6495.747   100   b #   base  481.535  513.818  543.4041  538.1365  560.4635  803.222   100   

Comments

Popular posts from this blog

Is there a better way to structure post methods in Class Based Views -

performance - Why is XCHG reg, reg a 3 micro-op instruction on modern Intel architectures? -

c# - Asp.net web api : redirect unauthorized requst to forbidden page -