python - Optimizing dataframe manipulation: new column based on conditional logic and multiple columns -
currently, works:
df['new'] = df.apply( \ lambda x: address[int(x['c1'][:5], 2)]+'_'+str(int(x['c1'][6:11], 2)) \ if x['c1'][5] == '1' \ else address[int(x['c2'][:5], 2)]+'_'+str(int(x['c2'][6:11], 2)), axis=1) `
address
dictionary.
but it's really slow. specifically, apply
ing whole dataframe considerably slower apply
ing selected column. however, new column based on multiple columns , i'm not sure how implement that.
additionally, there way vectorize these types of logical/conditional statements?
sample dataframe: <bound method dataframe.head of c1 c2 0 0000100111000111 0010110011000111 1 0001000111000111 0010110011000111 2 0101010001001010 0000000000000000 3 0101010010001110 0000000000000000 4 0101010011101010 0000000000000000 5 0111111100000100 0000000000000000 6 0111110010010110 0000000000000000 7 1000000001001100 0000000000000000 8 1110011110001000 0000000000000000 9 0000100001010000 0000000000000000 10 0001000001001010 0000000000000000 11 0101101100100100 0000000000000000 12 1110001100100100 0000000000000000 13 0010100101101001 0101010101101001 14 0000100101100000 0000000000000000 15 0000100110100000 0000000000000000 16 0001000101101011 0000000000000000 17 1001110000100001 0000000000000000 18 0111111000100000 0000000000000000 19 1000000100010110 0000000000000000 20 1110001111000010 0000000000000000 21 1011010001000010 0000000000000000 22 0110010001001111 0000000000000000 23 0111110000110101 0000000000000000 24 0111110001001100 0000000000000000 25 1000000000111101 0000000000000000 26 0000110001100010 0000000000000000 27 0001010001100010 0000000000000000 28 1100100100100101 1001011000000101 29 0101000010101010 0111110001001010 ... ... ... 95714 0101111100011000 0000000000000000 95715 0010101011001011 0000000000000000 95716 0010100111100110 0101010110100110 95717 0010101000100100 0101011011100100 95718 0101000110000101 0000000000000000
you need vectorized if-then-else
known np.where
(np
stands numpy
, in case).
import numpy np df['new'] = np.where(df['c1'].str[5] == '1', df['c1'].str[:5], df['c2'].str[:5]) # c1 c2 new #0 0000100111000111 0010110011000111 00101 #1 0001000111000111 0010110011000111 00101 #2 0101010001001010 0000000000000000 01010 #....
Comments
Post a Comment