python - Merge Pandas Dataframe on a column with structured data -


scenario: following previous question on how read excel file serve dataframe (how read excel file directly server python), trying merge contexts of multiple dataframes (which contain data excel worksheets).

issue: after searching similar issues here in so, still not able solve problem.

format of data (each sheet read dataframe):

sheet 1 (db1) name       cusip       date       price            xxx     01/01/2001     100  b          aaa     02/05/2005      90  c          zzz     03/07/2006      95  sheet2 (db2) ident      cusip       value      class   123        xxx          0.5        aa  444        aaa          1.3        ab  555        zzz          2,8        ac 

wanted output (fnl):

name       cusip       date       price       ident       value      class           xxx     01/01/2001     100         123          0.5        aa  b          aaa     02/05/2005      90         444          1.3        ab  c          zzz     03/07/2006      95         555          2.8        ac 

what tried: trying use merge function match each dataframe, getting error on "how" part.

fnl = db1  fnl = fnl.merge(db2, how='outer', on=['cusip'])  fnl = fnl.merge(db3, how='outer', on=['cusip'])  fnl = fnl.merge(bte, how='outer', on=['cusip']) 

i tried concatenate, list of dataframes, instead of single output.

wsframes = [db1 ,db2, db3]  fnl = pd.concat(wsframes, axis=1) 

question: proper way operation?

it seems need:

from functools import reduce #many dataframes dfs = [df1,df2] df = reduce(lambda x, y: x.merge(y, on='cusip', how='outer'), dfs) print (df)   name cusip        date  price  ident value class 0      xxx  01/01/2001    100    123   0.5    aa 1    b   aaa  02/05/2005     90    444   1.3    ab 2    c   zzz  03/07/2006     95    555   2,8    ac 

but columns in each dataframe has different (no matched columns (cusip here)), else _x , _y suffixes:

dfs = [df1,df1, df2] df = reduce(lambda x, y: x.merge(y, on='cusip', how='outer'), dfs) print (df)   name_x cusip      date_x  price_x name_y      date_y  price_y  ident value  \ 0        xxx  01/01/2001      100       01/01/2001      100    123   0.5    1      b   aaa  02/05/2005       90      b  02/05/2005       90    444   1.3    2      c   zzz  03/07/2006       95      c  03/07/2006       95    555   2,8       class   0    aa   1    ab   2    ac   

Comments

Popular posts from this blog

Is there a better way to structure post methods in Class Based Views -

performance - Why is XCHG reg, reg a 3 micro-op instruction on modern Intel architectures? -

jquery - Responsive Navbar with Sub Navbar -