bash - Calculating mean from values in columns specified on the first line using awk -


i have huge file (hundreds of lines, ca. 4,000 columns) structured this

locus   1   1   1   2   2   3   3   3 exon    1   2   3   1   2   1   2   3 data1   17.07   7.11    10.58   10.21   19.34   14.69   3.32    21.07 data2   21.42   11.46   7.88    9.89    27.24   12.40   0.58    19.82 

and need calculate mean values (on each data line separately) same locus number (i.e., same number in first line), i.e.

data1: mean first 3 values (three columns locus '1': 17.07, 7.11, 10.58), next 2 values (10.21, 19.34) , next 3 values (14.69, 3.32, 21.07)

i have output this

data1   mean1   mean2   mean3 data1   mean1   mean2   mean3 

i thinking using bash , awk... thank advice.

if me, use r, not awk:

library(data.table) x = fread('data.txt')  #> x #      v1    v2    v3    v4    v5    v6    v7   v8    v9 #1: locus  1.00  1.00  1.00  2.00  2.00  3.00 3.00  3.00 #2:  exon  1.00  2.00  3.00  1.00  2.00  1.00 2.00  3.00 #3: data1 17.07  7.11 10.58 10.21 19.34 14.69 3.32 21.07 #4: data2 21.42 11.46  7.88  9.89 27.24 12.40 0.58 19.82  # save first column of names later cnames = x$v1  # remove first column x[,v1:=null]  # matrix transpose: makes rows columns x = t(x)  # convert matrix data.table x = data.table(x,keep.rownames=f)  # set column names colnames(x) = cnames  #> x #   locus exon data1 data2 #1:     1    1 17.07 21.42 #...  # ditch useless column x[,exon:=null]  #> x #   locus data1 data2 #1:     1 17.07 21.42  # apply mean() function each column, grouped locus x[,lapply(.sd,mean),locus]  #   locus    data1    data2 #1:     1 11.58667 13.58667 #2:     2 14.77500 18.56500 #3:     3 13.02667 10.93333 

for convenience, here's whole thing again without comments:

library(data.table) x = fread('data.txt') cnames = x$v1 x[,v1:=null] x = t(x) x = data.table(x,keep.rownames=f) colnames(x) = cnames x[,exon:=null] x[,lapply(.sd,mean),locus] 

Comments

Popular posts from this blog

Is there a better way to structure post methods in Class Based Views -

performance - Why is XCHG reg, reg a 3 micro-op instruction on modern Intel architectures? -

jquery - Responsive Navbar with Sub Navbar -