r - How can I create new column in data frame by aggregating rows? -
i have large (~200k rows) dataframe structured this:
df <- data.frame(c(1,1,1,1,1), c('blue','blue','blue','blue','blue'), c('m','m','m','m','m'), c(2016,2016,2016,2016,2016),c(3,4,5,6,7), c(10,20,30,40,50)) colnames(df) <- c('id', 'color', 'size', 'year', 'week','revenue')
let's week 7, , want compare trailing 4 week average of revenue current week's revenue. create new column average when of identifiers match.
df_new <- data.frame(1, 'blue', 'm', 2016,7,50, 25 ) colnames(df_new) <- c('id', 'color', 'size', 'year', 'week','revenue', 't4ave')
how can accomplish efficiently? thank help
good question. loops pretty inefficient, since have check conditions of prior entries, solution can think of (mind you, i'm intermediate @ r):
for (i in 1:nrow(df)) { # condition entries match if ((i > 5) && (df$id[i] == df$id[i-1] == df$id[i-2] == df$id[i-3] == df$id[i-4]) && (df$color[i] == df$color[i-1] == df$color[i-2] == df$color[i-3] == df$color[i-4]) && (df$size[i] == df$size[i-1] == df$size[i-2] == df$size[i-3] == df$size[i-4]) && (df$year[i] == df$year[i-1] == df$year[i-2] == df$year[i-3] == df$year[i-4]) && (df$week[i] == df$week[i-1] == df$week[i-2] == df$week[i-3] == df$week[i-4])) # avg of last 4 entries' revenues avg <- mean(df$revenue[i-1] + df$revenue[i-2] + df$revenue[i-3] + df$revenue[i-4]) # create new variable of difference between entry , last 4's df$diff <- df$revenue[i] - avg }
this code take forever, should work. if 1 time thing when code needs run, should okay. otherwise, others able advise.
Comments
Post a Comment