Create tar files using nested forloop in R -


i have bunch of files. named follows:

bnt_20170301131740322_123456.csv,  bnt_20170301131740322_7891011.csv 

in filename, starting 5th character 12th character date , 13th , 14th character hour. rest dynamically generated , keep changing. in above example date 1st march 2017 , hour 13.

task 1: have create tar files zipping files match specific date , hour. depending on date , hour files gets generated, have multiple tar files output.

task 2: next task name tar files in specific pattern. each tar file should named in following pattern:

bnt_2017030111_2.tar 

in above name can see "bnt_" retained followed date , hour , 2 after _ (underscore) indicates number of files inside tar matches date , time. in above example, name indicates files date 1st march 2017 , hour parameter 11 tarred , tar has 2 files inside it.

what have done far:

#set working directory setwd("/home/mycomp/documents/filestotar/")  #list files files <- list.files(pattern = ".csv")  

i have listed name of files reproducibility

files <- c("bnt_20170301000000790_123456.csv", "bnt_20170301000000887_7891011.csv", "bnt_20170301000000947_7430180.csv", "bnt_20170301000001001_2243094.csv",  "bnt_20170301000001036_14195326.csv", "bnt_20170301000001036_14770776.csv",  "bnt_20170301000001078_10692013.csv", "bnt_20170301000001089_2966772.csv",  "bnt_20170301000001100_10890506.csv", "bnt_20170301000001576_7430180.csv") 

my code:

library(stringr) #extract date , time , set pattern identify files in folder #extracts date file name d <- substr(files, 5,12)  #extracts hour file name e <- substr(files, 13,14)  #creates pattern can used identify files matching pattern. pat <- paste("bnt","_",unique(d),unique(e),sep="")  #creates count of files unique hour parameter.  used create name tar file. f <- table(paste(d,e,sep=""))  #create unique names tar files g <- unique(paste("bnt",unique(d),unique(e),f,sep="_"))  #pasting extension .tar name of file h <- paste(g,".tar",sep="")    #create nested forloop tar files recursively (name in h) {   (i in seq_along(pat)) {     filestotar = (i in seq_along(pat)) {list.files(path = "/home/mycomp/documents/filestotar/", pattern = pat[i])}   }   tar(tarfile = name, files = filestotar) } 

the above creates required number of tar files. tar files includes files in folder in first tar , recursively includes newly tar files original files in folder in subsequent tar files.

for example, first tar file has csv files instead of matches pattern pat

the second tar file has first tar file + has csv files instead of matches pattern pat.

now continues every tar file gets created , last tar file has tar files got created + files matches pat.

the desired output is:

tar files matches date , hour in file name , name them bnt_ + date + hour + number of files + .tar follows:

bnt_2017030111_2.tar 

have created folder dummy files...just in case if helps:

https://drive.google.com/open?id=0bwprnxro3c1aaun2wmmts3dpz1u 

to avoid loop (not necessary, choice here), create data.frame holds information on files. in turn can slice , dice appropriate file names want.

xy <- data.frame(files = files, date = d, hour = e)  out <- split(xy, f = list(xy$date, xy$hour))  result <- sapply(out, fun = function(x) {   nfiles <- nrow(x)   name <- paste("bnt_", unique(x$date), unique(x$hour), "_", nfiles, ".tar", sep = "")    ### show, can remove ###   message(sprintf("for %s extracting %s files:", name, nfiles))    (i in x$files) {     message(i)   }   ### end show ###   tar(tarfile = name, files = x$files) }) 

Comments

Popular posts from this blog

Is there a better way to structure post methods in Class Based Views -

performance - Why is XCHG reg, reg a 3 micro-op instruction on modern Intel architectures? -

c# - Asp.net web api : redirect unauthorized requst to forbidden page -