bash - Calculating mean from values in columns specified on the first line using awk -

July 15, 2012

i have huge file (hundreds of lines, ca. 4,000 columns) structured this

locus   1   1   1   2   2   3   3   3 exon    1   2   3   1   2   1   2   3 data1   17.07   7.11    10.58   10.21   19.34   14.69   3.32    21.07 data2   21.42   11.46   7.88    9.89    27.24   12.40   0.58    19.82

and need calculate mean values (on each data line separately) same locus number (i.e., same number in first line), i.e.

data1: mean first 3 values (three columns locus '1': 17.07, 7.11, 10.58), next 2 values (10.21, 19.34) , next 3 values (14.69, 3.32, 21.07)

i have output this

data1   mean1   mean2   mean3 data1   mean1   mean2   mean3

i thinking using bash , awk... thank advice.

if me, use r, not awk:

library(data.table) x = fread('data.txt')  #> x #      v1    v2    v3    v4    v5    v6    v7   v8    v9 #1: locus  1.00  1.00  1.00  2.00  2.00  3.00 3.00  3.00 #2:  exon  1.00  2.00  3.00  1.00  2.00  1.00 2.00  3.00 #3: data1 17.07  7.11 10.58 10.21 19.34 14.69 3.32 21.07 #4: data2 21.42 11.46  7.88  9.89 27.24 12.40 0.58 19.82  # save first column of names later cnames = x$v1  # remove first column x[,v1:=null]  # matrix transpose: makes rows columns x = t(x)  # convert matrix data.table x = data.table(x,keep.rownames=f)  # set column names colnames(x) = cnames  #> x #   locus exon data1 data2 #1:     1    1 17.07 21.42 #...  # ditch useless column x[,exon:=null]  #> x #   locus data1 data2 #1:     1 17.07 21.42  # apply mean() function each column, grouped locus x[,lapply(.sd,mean),locus]  #   locus    data1    data2 #1:     1 11.58667 13.58667 #2:     2 14.77500 18.56500 #3:     3 13.02667 10.93333

for convenience, here's whole thing again without comments:

library(data.table) x = fread('data.txt') cnames = x$v1 x[,v1:=null] x = t(x) x = data.table(x,keep.rownames=f) colnames(x) = cnames x[,exon:=null] x[,lapply(.sd,mean),locus]

Search This Blog

How Y

bash - Calculating mean from values in columns specified on the first line using awk -

Comments

Post a Comment

Popular posts from this blog

Is there a better way to structure post methods in Class Based Views -

reflection - How to access the object-members of an object declaration in kotlin -

php - Doctrine Query Builder Error on Join: [Syntax Error] line 0, col 87: Error: Expected Literal, got 'JOIN' -