python - How to add additional sum column to the DataFrame based on specific column groups? -
in such case, have dataframe like
col1 col2 1 2 3 b 1 b 2
what want first groupby col1
, sum col2
columns of groups, add sum
dataframe , get
col1 col2 sum 1 6 2 6 3 6 b 1 3 b 2 3
option 1
transform
returns result same index of original object.
use assign
return copy of dataframe new column.
see split-apply-combine documentation more information.
df.assign(sum=df.groupby('col1').col2.transform('sum')) col1 col2 sum 0 1 6 1 2 6 2 3 6 3 b 1 3 4 b 2 3
option 2
use join
on results of normal groupby
, sum
.
df.join(df.groupby('col1').col2.sum().rename('sum'), on='col1') col1 col2 sum 0 1 6 1 2 6 2 3 6 3 b 1 3 4 b 2 3
option 3
creative approach pd.factorize
, np.bincount
f, u = df.col1.factorize() df.assign(sum=np.bincount(f, df.col2).astype(df.col2.dtype)[f]) col1 col2 sum 0 1 6 1 2 6 2 3 6 3 b 1 3 4 b 2 3
Comments
Post a Comment