python - How to spread a column in a Pandas data frame -
i have following pandas data frame:
import pandas pd import numpy np df = pd.dataframe({ 'fc': [100,100,112,1.3,14,125], 'sample_id': ['s1','s1','s1','s2','s2','s2'], 'gene_symbol': ['a', 'b', 'c', 'a', 'b', 'c'], }) df = df[['gene_symbol', 'sample_id', 'fc']] df
which produces this:
out[11]: gene_symbol sample_id fc 0 s1 100.0 1 b s1 100.0 2 c s1 112.0 3 s2 1.3 4 b s2 14.0 5 c s2 125.0
how can spread sample_id
in end this:
gene_symbol s1 s2 100 1.3 b 100 14.0 c 112 125.0
#df = df[['gene_symbol', 'sample_id', 'fc']] df = df.pivot(index='gene_symbol',columns='sample_id',values='fc') print (df) sample_id s1 s2 gene_symbol 100.0 1.3 b 100.0 14.0 c 112.0 125.0
df = df.set_index(['gene_symbol','sample_id'])['fc'].unstack(fill_value=0) print (df) sample_id s1 s2 gene_symbol 100.0 1.3 b 100.0 14.0 c 112.0 125.0
but if duplicates, need pivot_table
or aggregate groupby
or , mean
can changed sum
, median
, ...:
df = pd.dataframe({ 'fc': [100,100,112,1.3,14,125, 100], 'sample_id': ['s1','s1','s1','s2','s2','s2', 's2'], 'gene_symbol': ['a', 'b', 'c', 'a', 'b', 'c', 'c'], }) print (df) fc gene_symbol sample_id 0 100.0 s1 1 100.0 b s1 2 112.0 c s1 3 1.3 s2 4 14.0 b s2 5 125.0 c s2 <- same c, s2, different fc 6 100.0 c s2 <- same c, s2, different fc
df = df.pivot(index='gene_symbol',columns='sample_id',values='fc')
valueerror: index contains duplicate entries, cannot reshape
df = df.pivot_table(index='gene_symbol',columns='sample_id',values='fc', aggfunc='mean') print (df) sample_id s1 s2 gene_symbol 100.0 1.3 b 100.0 14.0 c 112.0 112.5
df = df.groupby(['gene_symbol','sample_id'])['fc'].mean().unstack(fill_value=0) print (df) sample_id s1 s2 gene_symbol 100.0 1.3 b 100.0 14.0 c 112.0 112.5
edit:
for cleaning set columns name
none
, reset_index
:
df.columns.name = none df = df.reset_index() print (df) gene_symbol s1 s2 0 100.0 1.3 1 b 100.0 14.0 2 c 112.0 112.5
Comments
Post a Comment