python - enumerate groups in a dataframe -

September 15, 2013

i have following table

date        ui  mw  maxw    tc  hl    msurp 01/03/2004    10   10     ec  0.25   0.1 01/04/2004    10   10     ec  0.25   -0.1 01/03/2004  b   20   20     bc  0.5    0.3 01/03/2004  b   20   20     bc  0.25    0.3

what looking add column table enumerates unique combinations of ui, mw, maxw, tc , hl , enumerates

so example in above table

unique combinations of ui, mw, maxw, tc , hl are

 a,10, 10, ec, 0.25  b,20, 20, bc, 0.5  b,20, 20, bc, 0.5

there total 3 output should like

date        ui  mw  maxw    tc  hl    msurp  counter 01/03/2004    10   10     ec  0.25   0.1    1 01/04/2004    10   10     ec  0.25   -0.1   1 01/03/2004  b   20   20     bc  0.5    0.3    2 01/03/2004  b   20   20     bc  0.25    0.3   3

option 1
pd.series.factorize

df.assign(    counter=df[['ui', 'mw', 'maxw', 'tc', 'hl']].apply(tuple, 1).factorize()[0] + 1)           date ui  mw  maxw  tc    hl  msurp  counter 0  01/03/2004   10    10  ec  0.25    0.1        1 1  01/04/2004   10    10  ec  0.25   -0.1        1 2  01/03/2004  b  20    20  bc  0.50    0.3        2 3  01/03/2004  b  20    20  bc  0.25    0.3        3

option 1.5
more obnoxious version of option 1 should faster

df.assign(     counter=pd.factorize(list(zip(         *[df[c].values.tolist() c in ['ui', 'mw', 'maxw', 'tc', 'hl']]     )))[0] + 1 )           date ui  mw  maxw  tc    hl  msurp  counter 0  01/03/2004   10    10  ec  0.25    0.1        1 1  01/04/2004   10    10  ec  0.25   -0.1        1 2  01/03/2004  b  20    20  bc  0.50    0.3        2 3  01/03/2004  b  20    20  bc  0.25    0.3        3

option 2
@ayhan's answer (will delete if posts it)

df.assign(     counter=df.groupby(['ui', 'mw', 'maxw', 'tc', 'hl']).ngroup() + 1)           date ui  mw  maxw  tc    hl  msurp  counter 0  01/03/2004   10    10  ec  0.25    0.1        1 1  01/04/2004   10    10  ec  0.25   -0.1        1 2  01/03/2004  b  20    20  bc  0.50    0.3        3 3  01/03/2004  b  20    20  bc  0.25    0.3        2

timing
code below

(lambda r: r.div(r.min(1), 0).assign(best=lambda x: x.idxmin(1)))(results)               pir1      pir2     ayhan   best 100     17.260639  1.000000  3.438354   pir2 300     30.550010  1.000000  2.598456   pir2 1000    43.201163  1.000000  1.236190   pir2 3000    61.593932  1.000000  1.025420   pir2 10000  127.003138  2.177171  1.000000  ayhan

pir1 = lambda d: d.assign(counter=d[['ui', 'mw', 'maxw', 'tc', 'hl']].apply(tuple, 1).factorize()[0] + 1) pir2 = lambda d: d.assign(counter=pd.factorize(list(zip(*[d[c].values.tolist() c in ['ui', 'mw', 'maxw', 'tc', 'hl']])))[0] + 1) ayhan = lambda d: d.assign(counter=d.groupby(['ui', 'mw', 'maxw', 'tc', 'hl']).ngroup() + 1)  results = pd.dataframe(     index=[100, 300, 1000, 3000, 10000],     columns='pir1 pir2 ayhan'.split(),     dtype=float )  in results.index:     d = pd.concat([df] * i, ignore_index=true)     j in results.columns:         stmt = '{}(d)'.format(j)         setp = 'from __main__ import d, {}'.format(j)         results.set_value(i, j, timeit(stmt, setp, number=10))  results.plot(loglog=true)

Search This Blog

How Y

python - enumerate groups in a dataframe -

Comments

Post a Comment

Popular posts from this blog

meteor - inserting data to database gives error "insert failed: Method '/texts/insert' not found" -

angular - DownloadURL return null in below code -

html - unterminated string literal “onclick” event in anchor -