python - enumerate groups in a dataframe -
i have following table
date ui mw maxw tc hl msurp 01/03/2004 10 10 ec 0.25 0.1 01/04/2004 10 10 ec 0.25 -0.1 01/03/2004 b 20 20 bc 0.5 0.3 01/03/2004 b 20 20 bc 0.25 0.3 what looking add column table enumerates unique combinations of ui, mw, maxw, tc , hl , enumerates
so example in above table
unique combinations of ui, mw, maxw, tc , hl are
a,10, 10, ec, 0.25 b,20, 20, bc, 0.5 b,20, 20, bc, 0.5 there total 3 output should like
date ui mw maxw tc hl msurp counter 01/03/2004 10 10 ec 0.25 0.1 1 01/04/2004 10 10 ec 0.25 -0.1 1 01/03/2004 b 20 20 bc 0.5 0.3 2 01/03/2004 b 20 20 bc 0.25 0.3 3
option 1
pd.series.factorize
df.assign( counter=df[['ui', 'mw', 'maxw', 'tc', 'hl']].apply(tuple, 1).factorize()[0] + 1) date ui mw maxw tc hl msurp counter 0 01/03/2004 10 10 ec 0.25 0.1 1 1 01/04/2004 10 10 ec 0.25 -0.1 1 2 01/03/2004 b 20 20 bc 0.50 0.3 2 3 01/03/2004 b 20 20 bc 0.25 0.3 3 option 1.5
more obnoxious version of option 1 should faster
df.assign( counter=pd.factorize(list(zip( *[df[c].values.tolist() c in ['ui', 'mw', 'maxw', 'tc', 'hl']] )))[0] + 1 ) date ui mw maxw tc hl msurp counter 0 01/03/2004 10 10 ec 0.25 0.1 1 1 01/04/2004 10 10 ec 0.25 -0.1 1 2 01/03/2004 b 20 20 bc 0.50 0.3 2 3 01/03/2004 b 20 20 bc 0.25 0.3 3 option 2
@ayhan's answer (will delete if posts it)
df.assign( counter=df.groupby(['ui', 'mw', 'maxw', 'tc', 'hl']).ngroup() + 1) date ui mw maxw tc hl msurp counter 0 01/03/2004 10 10 ec 0.25 0.1 1 1 01/04/2004 10 10 ec 0.25 -0.1 1 2 01/03/2004 b 20 20 bc 0.50 0.3 3 3 01/03/2004 b 20 20 bc 0.25 0.3 2 timing
code below
(lambda r: r.div(r.min(1), 0).assign(best=lambda x: x.idxmin(1)))(results) pir1 pir2 ayhan best 100 17.260639 1.000000 3.438354 pir2 300 30.550010 1.000000 2.598456 pir2 1000 43.201163 1.000000 1.236190 pir2 3000 61.593932 1.000000 1.025420 pir2 10000 127.003138 2.177171 1.000000 ayhan pir1 = lambda d: d.assign(counter=d[['ui', 'mw', 'maxw', 'tc', 'hl']].apply(tuple, 1).factorize()[0] + 1) pir2 = lambda d: d.assign(counter=pd.factorize(list(zip(*[d[c].values.tolist() c in ['ui', 'mw', 'maxw', 'tc', 'hl']])))[0] + 1) ayhan = lambda d: d.assign(counter=d.groupby(['ui', 'mw', 'maxw', 'tc', 'hl']).ngroup() + 1) results = pd.dataframe( index=[100, 300, 1000, 3000, 10000], columns='pir1 pir2 ayhan'.split(), dtype=float ) in results.index: d = pd.concat([df] * i, ignore_index=true) j in results.columns: stmt = '{}(d)'.format(j) setp = 'from __main__ import d, {}'.format(j) results.set_value(i, j, timeit(stmt, setp, number=10)) results.plot(loglog=true) 
Comments
Post a Comment