python - using groupby/aggregate to return multiple columns -


i have example dataset want groupby 1 column , produce 4 new columns based on of values of existing columns.

here sample data:

data = {'alignmentid': {0: u'ensmust00000000001.4-1',   1: u'ensmust00000000001.4-1',   2: u'ensmust00000000003.13-0',   3: u'ensmust00000000003.13-0',   4: u'ensmust00000000003.13-0'},  'name': {0: u'noncodingdeletion',   1: u'noncodinginsertion',   2: u'codingdeletion',   3: u'codinginsertion',   4: u'noncodingdeletion'},  'value_cds': {0: nan, 1: nan, 2: 1.0, 3: 1.0, 4: nan},  'value_mrna': {0: 21.0, 1: 26.0, 2: 1.0, 3: 1.0, 4: 2.0}} df = pd.dataframe.from_dict(data) 

which looks this:

               alignmentid                name  value_mrna  value_cds 0   ensmust00000000001.4-1   noncodingdeletion        21.0        nan 1   ensmust00000000001.4-1  noncodinginsertion        26.0        nan 2  ensmust00000000003.13-0      codingdeletion         1.0        1.0 3  ensmust00000000003.13-0     codinginsertion         1.0        1.0 4  ensmust00000000003.13-0   noncodingdeletion         2.0        nan 

i want return booleans based on presence/absence of values in name column depending on whether value_cds contains null values. produced function so:

def aggfunc(s):     if s.value_cds.any():         c = set(s.name)     else:         c = set(s.name)     return ('codingdeletion' in c or 'codinginsertion' in c,              'codinginsertion' in c, 'codingdeletion' in c,              'codingmult3deletion' in c or 'codingmult3insertion' in c) 

and did this:

merged = df.groupby('alignmentid').aggregate(aggfunc) 

which gives me error valueerror: shape of passed values (318, 4), indices imply (318, 3).

how can return multiple new columns groupby-aggregate?

the output looking is:

ensmust00000000001.4-1 (false, false, false, false) ensmust00000000003.13-0 (true, true, true, false) 

which ideally put 5-column dataframe.

if use .apply, output incorrect:

ensmust00000000001.4-1     (false, false, false, false) ensmust00000000003.13-0    (false, false, false, false) 

but if grab groups 1 @ time, correct:

in [380]: aln_id, d in df.groupby('alignmentid'):    .....:     print aggfunc(d)    .....: (false, false, false, false) (true, true, true, false) 

you need change name ['name'], because .name return name of group (value of column grouping by):

def aggfunc(s):     if s.value_cds.any():         c = set(s['name'])     else:         c = set(s['name'])      return ('codingdeletion' in c or 'codinginsertion' in c,              'codinginsertion' in c, 'codingdeletion' in c,              'codingmult3deletion' in c or 'codingmult3insertion' in c)  merged = df.groupby('alignmentid').apply(aggfunc) print (merged) alignmentid ensmust00000000001.4-1     (false, false, false, false) ensmust00000000003.13-0       (true, true, true, false) dtype: object 

def aggfunc(s):      print ('name of group is: {}'.format((s.name)))       print ('column name is:\n {}'.format(s['name']))     merged = df.groupby('alignmentid').apply(aggfunc) print (merged)  name of group is: ensmust00000000001.4-1 column name is:  0     noncodingdeletion 1    noncodinginsertion name: name, dtype: object name of group is: ensmust00000000001.4-1 column name is:  0     noncodingdeletion 1    noncodinginsertion name: name, dtype: object name of group is: ensmust00000000003.13-0 column name is:  2       codingdeletion 3      codinginsertion 4    noncodingdeletion name: name, dtype: object 

improved code:

def aggfunc(s):     #if , else return same c, omitted     c = set(s['name'])      #added series return columns instead tuples     cols = ['col1','col2','col3','col4']     return pd.series(('codingdeletion' in c or 'codinginsertion' in c,              'codinginsertion' in c, 'codingdeletion' in c,              'codingmult3deletion' in c or 'codingmult3insertion' in c), index=cols)  merged = df.groupby('alignmentid').apply(aggfunc) print (merged)                            col1   col2   col3   col4 alignmentid                                         ensmust00000000001.4-1   false  false  false  false ensmust00000000003.13-0   true   true   true  false 

Comments

Popular posts from this blog

Is there a better way to structure post methods in Class Based Views -

performance - Why is XCHG reg, reg a 3 micro-op instruction on modern Intel architectures? -

jquery - Responsive Navbar with Sub Navbar -