shell - Bash group by on the basis of n number of columns -
this related previous question [asked] (bash command group count)
what if want generalize this? instance input file
abc|1|2 abc|3|4 bcd|7|2 abc|5|6 bcd|3|5
the output should be
abc|9|12 bcd|10|7
the result calculated group first column , adding values of 2nd column, , 3rd column, similar group command in sql.
i tried modifying command provided in link failed. don't know whether i'm making conceptual error or silly mistake know none of mentioned commands aren't working.
command used
awk -f "|" '{arr[$1]+=$2} end arr2[$1]+=$5 end {for (i in arr) {print i"|"arr[i]"|"arr2[i]}}' sample awk -f "|" '{arr[$1]+=$2} end {arr2[$1]+=$5} end {for (i in arr) {print i"|"arr[i]"|"arr2[i]}}' sample awk -f "|" '{arr[$1]+=$2 arr2[$1]+=$5} end {for (i in arr2) {print i"|"arr[i]"|"arr2[i]}}' sample
additionally, if i'm trying here limit use summing columns upto 2 only. if there n columns , want perform operations such addition in 1 column , subtraction in other? how can further modified?
example
abc|1|2|4|......... upto n columns abc|4|5|6|......... upto n columns def|1|4|6|......... upto n columns
lets if sum needed first column, average may second column, other operation third column, etc. how can tackled?
for 3 fields (key , 2 data fields):
$ awk ' begin { fs=ofs="|" } # set separators { a[$1]+=$2 # sum second field hash b[$1]+=$3 # ... b hash } end { # in end for(i in a) # loop print i,a[i],b[i] # , output }' file bcd|10|7 abc|9|12
more generic solution n columns using gnu awk:
$ awk ' begin { fs=ofs="|" } { for(i=2;i<=nf;i++) # loop data fields a[$1][i]+=$i # sum them related cells a[$1][1]=i # set field count first cell } end { for(i in a) { for((j=2)&&b="";j<a[i][1];j++) # buffer output b=b (b==""?"":ofs)a[i][j] print i,b # output } }' file bcd|10|7 abc|9|12
latter tested 2 fields (busy @ meeting :).
Comments
Post a Comment