python - How to add additional sum column to the DataFrame based on specific column groups? -


in such case, have dataframe like

col1  col2        1        2        3    b     1    b     2 

what want first groupby col1 , sum col2 columns of groups, add sum dataframe , get

col1  col2  sum        1    6        2    6        3    6    b     1    3    b     2    3 

option 1
transform returns result same index of original object.
use assign return copy of dataframe new column.
see split-apply-combine documentation more information.

df.assign(sum=df.groupby('col1').col2.transform('sum'))    col1  col2  sum 0        1    6 1        2    6 2        3    6 3    b     1    3 4    b     2    3 

option 2
use join on results of normal groupby , sum.

df.join(df.groupby('col1').col2.sum().rename('sum'), on='col1')    col1  col2  sum 0        1    6 1        2    6 2        3    6 3    b     1    3 4    b     2    3 

option 3
creative approach pd.factorize , np.bincount

f, u = df.col1.factorize() df.assign(sum=np.bincount(f, df.col2).astype(df.col2.dtype)[f])    col1  col2  sum 0        1    6 1        2    6 2        3    6 3    b     1    3 4    b     2    3 

Comments

Popular posts from this blog

Is there a better way to structure post methods in Class Based Views -

performance - Why is XCHG reg, reg a 3 micro-op instruction on modern Intel architectures? -

c# - Asp.net web api : redirect unauthorized requst to forbidden page -