python - Creating a new column in a dataframe based on the result of the addition of three others -
i have produced following code:
data['customer_segment'] = np.where(((data['order frequency segment']+data['order_size_seg']+data['movc % segment'])<=5,1), np.where((data['order frequency segment']+data['order_size_seg']+data['movc % segment'])>5 & (data['order frequency segment']+data['order_size_seg']+data['movc % segment'])<=8,2), np.where((data['order frequency segment']+data['order_size_seg']+data['movc % segment'])>8 & (data['order frequency segment']+data['order_size_seg']+data['movc % segment'])<=11,3), np.where((data['order frequency segment']+data['order_size_seg']+data['movc % segment'])>11 & (data['order frequency segment']+data['order_size_seg']+data['movc % segment'])<=14,4),5)
i'm getting following error:
valueerror: truth value of series ambiguous. use a.empty, a.bool(), a.item(), a.any() or a.all().
would appreciate reach best solution, feel 1 i'm trying may not optimal.
example of input follows:
movc % segment order_size_seg order frequency segment 1 2 3 5 2 1 5 5 5
i trying add column based on result of summing each row follows:
if 3-5 1 if 6-8 2 if 9-11 3 if 12-14 4 if 15+ 5
would this
i think need instead multiple np.where
1 numpy.select
:
#only once sum values = data['order frequency segment']+data['order_size_seg']+data['movc % segment'] #conditions () m1 = a<=5 m2 = (a>5) & (a<=8) m3 = (a>8) & (a<=11) m4 = (a>11) & (a<=14) data['customer_segment'] = np.select([m1, m2, m3, m4],[1,2,3,4], default=5)
another solution use cut
:
bins = [-np.inf,5,8,11,14, np.inf] labels = [1,2,3,4,5] data['customer_segment'] = pd.cut(df['b'], bins=bins, labels=labels)
Comments
Post a Comment