python - Merge Pandas Dataframe on a column with structured data -
scenario: following previous question on how read excel file serve dataframe (how read excel file directly server python), trying merge contexts of multiple dataframes (which contain data excel worksheets).
issue: after searching similar issues here in so, still not able solve problem.
format of data (each sheet read dataframe):
sheet 1 (db1) name cusip date price xxx 01/01/2001 100 b aaa 02/05/2005 90 c zzz 03/07/2006 95 sheet2 (db2) ident cusip value class 123 xxx 0.5 aa 444 aaa 1.3 ab 555 zzz 2,8 ac
wanted output (fnl):
name cusip date price ident value class xxx 01/01/2001 100 123 0.5 aa b aaa 02/05/2005 90 444 1.3 ab c zzz 03/07/2006 95 555 2.8 ac
what tried: trying use merge function match each dataframe, getting error on "how" part.
fnl = db1 fnl = fnl.merge(db2, how='outer', on=['cusip']) fnl = fnl.merge(db3, how='outer', on=['cusip']) fnl = fnl.merge(bte, how='outer', on=['cusip'])
i tried concatenate, list of dataframes, instead of single output.
wsframes = [db1 ,db2, db3] fnl = pd.concat(wsframes, axis=1)
question: proper way operation?
it seems need:
from functools import reduce #many dataframes dfs = [df1,df2] df = reduce(lambda x, y: x.merge(y, on='cusip', how='outer'), dfs) print (df) name cusip date price ident value class 0 xxx 01/01/2001 100 123 0.5 aa 1 b aaa 02/05/2005 90 444 1.3 ab 2 c zzz 03/07/2006 95 555 2,8 ac
but columns in each dataframe has different (no matched columns (cusip
here)), else _x
, _y
suffixes:
dfs = [df1,df1, df2] df = reduce(lambda x, y: x.merge(y, on='cusip', how='outer'), dfs) print (df) name_x cusip date_x price_x name_y date_y price_y ident value \ 0 xxx 01/01/2001 100 01/01/2001 100 123 0.5 1 b aaa 02/05/2005 90 b 02/05/2005 90 444 1.3 2 c zzz 03/07/2006 95 c 03/07/2006 95 555 2,8 class 0 aa 1 ab 2 ac
Comments
Post a Comment