python - How to add additional sum column to the DataFrame based on specific column groups? -
in such case, have dataframe like
col1 col2 1 2 3 b 1 b 2 what want first groupby col1 , sum col2 columns of groups, add sum dataframe , get
col1 col2 sum 1 6 2 6 3 6 b 1 3 b 2 3
option 1
transform returns result same index of original object.
use assign return copy of dataframe new column.
see split-apply-combine documentation more information.
df.assign(sum=df.groupby('col1').col2.transform('sum')) col1 col2 sum 0 1 6 1 2 6 2 3 6 3 b 1 3 4 b 2 3 option 2
use join on results of normal groupby , sum.
df.join(df.groupby('col1').col2.sum().rename('sum'), on='col1') col1 col2 sum 0 1 6 1 2 6 2 3 6 3 b 1 3 4 b 2 3 option 3
creative approach pd.factorize , np.bincount
f, u = df.col1.factorize() df.assign(sum=np.bincount(f, df.col2).astype(df.col2.dtype)[f]) col1 col2 sum 0 1 6 1 2 6 2 3 6 3 b 1 3 4 b 2 3
Comments
Post a Comment