python 3.x - Create new column and fill with added strings from same df with underscore as delimiter -


i have following dataframe:

import pandas pd ds = pd.dataframe({'place': [1,2,3], 'date': ['6/7/2021','1/1/2031','1/4/2011'],  'city':['moscow','new york','sidney'], 'kids':[5,3,1]}) 

that looks this:

city        date      kids   place moscow    6/7/2021     5       1 new york  1/1/2031     3       2 sidney    1/4/2011     1       3 

i need add new column 'key' dataframe, values in string union of values in each row of columns: 'city', 'date', 'place' underscore ('_') delimiter.

eventual goal:

city        date      kids   place   key moscow    6/7/2021     5       1     moscow_6/7/2021_1 new york  1/1/2031     3       2     new york_1/1/2031_2 sidney    1/4/2011     1       3     sidney_1/4/2011_3 

first of all, i've converted integers in 'place' column strings:

df['place'].apply(str) 

secondly, i've removed columns don't need (but column should in resulting df):

col_list= list(df) col_list.remove('kids') 

then i've tried create new column , fill appended strings:

df['key'] = df[col_list].apply(lambda x: x.sum()) 

but returns me nan values in 'key' column , values still appended not undersoce delimiter.

you can add columns similar building str, need cast last column str:

in[87]: df['key'] = df['city'] + '_' + df['date'] + '_' + df['place'].astype(str) df  out[87]:         city      date  kids  place                  key 0    moscow  6/7/2021     5      1    moscow_6/7/2021_1 1  new york  1/1/2031     3      2  new york_1/1/2031_2 2    sidney  1/4/2011     1      3    sidney_1/4/2011_3 

timings 30k row df:

%timeit  df['city'] + '_' + df['date'] + '_' + df['place'].astype(str) df %timeit df[['city', 'date', 'place']].astype(str).apply('_'.join, 1) %timeit df[['city', 'date', 'place']].astype(str).transform('_'.join, 1)  100 loops, best of 3: 9 ms per loop 10 loops, best of 3: 84 ms per loop 10 loops, best of 3: 83 ms per loop 

this method seems 10x faster, it's straight vectorised addition, apply cythonised for loop, looks transform needs form of grouping, sorting, , creating new object index aligned original df time being taken guess

other @johngalt method:

%timeit pd.series(df[['city', 'date', 'place']].astype(str).values.tolist()).str.join('_')  100 loops, best of 3: 10.5 ms per loop 

so marginally slower answer.


Comments

Popular posts from this blog

Is there a better way to structure post methods in Class Based Views -

performance - Why is XCHG reg, reg a 3 micro-op instruction on modern Intel architectures? -

jquery - Responsive Navbar with Sub Navbar -