R bigmemory always use backing file? -
we trying use bigmemory library foreach parallel our analysis. however, as.big.matrix function seems use backingfile. our workstations have enough memory, there way use bigmemory without backing file?
this code x.big.desc <-describe(as.big.matrix(x))
pretty slow write data c:\programdata\boost_interprocess\
. somehow slower save x directly, as.big.matrix have slower i/o?
this code x.big.desc <-describe(as.big.matrix(x, backingfile = ""))
pretty fast, however, save copy of data %tmp% directory. think reason fast, because r kick off background writing process, instead of writing data. (we can see writing thread in taskmanager after r prompt returns).
is there way use bigmemory ram only, each worker in foreach loop can access data via ram?
thanks help.
so, if have enough ram, use standard r matrices. pass part of each matrix each cluster, use rdsfiles.
one example computing colsums
3 cores:
# functions splitting cutbysize <- function(m, nb) { int <- m / nb upper <- round(1:nb * int) lower <- c(1, upper[-nb] + 1) size <- c(upper[1], diff(upper)) cbind(lower, upper, size) } seq2 <- function(lims) seq(lims[1], lims[2]) # matrix bm <- matrix(1, 10e3, 1e3) ncores <- 3 intervals <- cutbysize(ncol(bm), ncores) # save each part in different file tmpfile <- tempfile() (ic in seq_len(ncores)) { saverds(bm[, seq2(intervals[ic, ])], paste0(tmpfile, ic, ".rds")) } # parallel computation reading 1 part @ beginning cl <- parallel::makecluster(ncores) doparallel::registerdoparallel(cl) library(foreach) colsums <- foreach(ic = seq_len(ncores), .combine = 'c') %dopar% { bm.part <- readrds(paste0(tmpfile, ic, ".rds")) colsums(bm.part) } parallel::stopcluster(cl) # checking results all.equal(colsums, colsums(bm))
you use rm(bm); gc()
after writing parts disk.
Comments
Post a Comment