r - How can I create new column in data frame by aggregating rows? -


i have large (~200k rows) dataframe structured this:

df <-  data.frame(c(1,1,1,1,1), c('blue','blue','blue','blue','blue'), c('m','m','m','m','m'), c(2016,2016,2016,2016,2016),c(3,4,5,6,7), c(10,20,30,40,50))  colnames(df) <- c('id', 'color', 'size', 'year', 'week','revenue') 

let's week 7, , want compare trailing 4 week average of revenue current week's revenue. create new column average when of identifiers match.

df_new <-  data.frame(1, 'blue', 'm', 2016,7,50, 25 )  colnames(df_new) <- c('id', 'color', 'size', 'year', 'week','revenue', 't4ave') 

how can accomplish efficiently? thank help

good question. loops pretty inefficient, since have check conditions of prior entries, solution can think of (mind you, i'm intermediate @ r):

for (i in 1:nrow(df)) {     # condition entries match     if ((i > 5) && (df$id[i] == df$id[i-1] == df$id[i-2] == df$id[i-3] == df$id[i-4])     && (df$color[i] == df$color[i-1] == df$color[i-2] == df$color[i-3] == df$color[i-4])     && (df$size[i] == df$size[i-1] == df$size[i-2] == df$size[i-3] == df$size[i-4])     && (df$year[i] == df$year[i-1] == df$year[i-2] == df$year[i-3] == df$year[i-4])     && (df$week[i] == df$week[i-1] == df$week[i-2] == df$week[i-3] == df$week[i-4]))      # avg of last 4 entries' revenues     avg <- mean(df$revenue[i-1] + df$revenue[i-2] + df$revenue[i-3] + df$revenue[i-4])      # create new variable of difference between entry , last 4's     df$diff <- df$revenue[i] - avg } 

this code take forever, should work. if 1 time thing when code needs run, should okay. otherwise, others able advise.


Comments

Popular posts from this blog

Is there a better way to structure post methods in Class Based Views -

performance - Why is XCHG reg, reg a 3 micro-op instruction on modern Intel architectures? -

c# - Asp.net web api : redirect unauthorized requst to forbidden page -