python - Pandas how to use pd.cut() -
here snippet:
test = pd.dataframe({'days': [0,31,45]}) test['range'] = pd.cut(test.days, [0,30,60])
output:
days range 0 0 nan 1 31 (30, 60] 2 45 (30, 60]
i surprised 0 not in (0, 30], should categorize 0 (0, 30]?
test['range'] = pd.cut(test.days, [0,30,60], include_lowest=true) print (test) days range 0 0 (-0.001, 30.0] 1 31 (30.0, 60.0] 2 45 (30.0, 60.0]
see difference:
test = pd.dataframe({'days': [0,20,30,31,45,60]}) test['range1'] = pd.cut(test.days, [0,30,60], include_lowest=true) #30 value in [30, 60) group test['range2'] = pd.cut(test.days, [0,30,60], right=false) #30 value in (0, 30] group test['range3'] = pd.cut(test.days, [0,30,60]) print (test) days range1 range2 range3 0 0 (-0.001, 30.0] [0, 30) nan 1 20 (-0.001, 30.0] [0, 30) (0, 30] 2 30 (-0.001, 30.0] [30, 60) (0, 30] 3 31 (30.0, 60.0] [30, 60) (30, 60] 4 45 (30.0, 60.0] [30, 60) (30, 60] 5 60 (30.0, 60.0] nan (30, 60]
or use numpy.searchsorted
, values of days
hast sorted:
arr = np.array([0,30,60]) test['range1'] = arr.searchsorted(test.days) test['range2'] = arr.searchsorted(test.days, side='right') - 1 print (test) days range1 range2 0 0 0 0 1 20 1 0 2 30 1 1 3 31 2 1 4 45 2 1 5 60 2 2
Comments
Post a Comment