python - Creating a new column in a dataframe based on the result of the addition of three others -
i have produced following code:
data['customer_segment'] = np.where(((data['order frequency segment']+data['order_size_seg']+data['movc % segment'])<=5,1), np.where((data['order frequency segment']+data['order_size_seg']+data['movc % segment'])>5 & (data['order frequency segment']+data['order_size_seg']+data['movc % segment'])<=8,2), np.where((data['order frequency segment']+data['order_size_seg']+data['movc % segment'])>8 & (data['order frequency segment']+data['order_size_seg']+data['movc % segment'])<=11,3), np.where((data['order frequency segment']+data['order_size_seg']+data['movc % segment'])>11 & (data['order frequency segment']+data['order_size_seg']+data['movc % segment'])<=14,4),5)   i'm getting following error:
valueerror: truth value of series ambiguous. use a.empty, a.bool(), a.item(), a.any() or a.all().   would appreciate reach best solution, feel 1 i'm trying may not optimal.
example of input follows:
movc % segment  order_size_seg  order frequency segment 1                      2                 3 5                      2                 1 5                      5                 5   i trying add column based on result of summing each row follows:
if 3-5 1 if 6-8 2 if 9-11 3 if 12-14 4 if 15+ 5
would this
i think need instead multiple np.where 1 numpy.select:
#only once sum values  = data['order frequency segment']+data['order_size_seg']+data['movc % segment'] #conditions () m1 = a<=5 m2 = (a>5) & (a<=8) m3 = (a>8) & (a<=11) m4 = (a>11) & (a<=14)  data['customer_segment'] = np.select([m1, m2, m3, m4],[1,2,3,4], default=5)   another solution use cut:
bins = [-np.inf,5,8,11,14, np.inf] labels = [1,2,3,4,5]  data['customer_segment'] = pd.cut(df['b'], bins=bins, labels=labels)      
Comments
Post a Comment