R - How to create all n-1 long subsets of a vector and save both the remaining vector and the removed vector efficiently? -
i toying around building recommender system. have historical purchases of users.
my data looks
> head(baskets) # tibble: 6 x 2 # groups: user_id [2] user_id basket <int> <list> 1 8 <int [21]> 2 8 <int [13]> 3 8 <int [15]> 4 12 <int [22]> 5 12 <int [20]> 6 12 <int [17]> > baskets$basket[[1]] [1] 651 1529 2078 6141 6473 9839 14992 16349 17794 20920 21903 [12] 23165 23400 24838 28985 32030 34190 39110 39812 44099 49533
okay want remove 1 item each basket , save target item, , save rest of basket new basket. repeated items in basket. if had example user user_id = 1 , basket = [1,2,3] get
user_id basket target 1 2,3 1 1 1,3 2 1 1,2 3
how can construct such data.frame / tibble in efficient way? have solution seems work quite slow, , since have large amount of data find better solution if possible.
currently have
orderdf <- data.frame(user_id = integer(), basket = list(), target = integer()) for(k in 1:dim(baskets)[1]){ print(k) currbasket <- baskets$basket[[k]] currbaskets <- lapply(1:length(currbasket), function(i) currbasket[i]) curruser <- baskets$user_id[k] for(j in 1:length(currbaskets)){ tempdf <- tibble(user_id = baskets$user_id[k], basket = list(currbaskets[[j]]), target = currbasket[j]) orderdf <- rbind(orderdf, tempdf) } }
first create myself reproductable dataset
baskets <- data.frame(user_id = 1:10) (i in 1:nrow(df)){ baskets$basket[i] = list(sample(1:100, 3, replace=f)) } head(baskets)
next time, please provide reproductable set!
the next thing build function handle 1 line:
x = baskets[1,] x$basket = x$basket[[1]] require(data.table) foraline <- function(x){ n_inbasket <- length(unlist(x$basket)) result <- data.table(user_id = rep(x$user_id, n_inbasket)) result$basket <- sapply(1:n_inbasket, function(i){list(unlist(x$basket)[-i])}) result$target <- x$basket return(result) } foraline(x)
ok , now, apply on lines , reduce in 1 data.frame using rbindlist
data.table package.
require(data.table) order_basket <- rbindlist(apply(baskets, 1, foraline)) head(order_basket)
Comments
Post a Comment