python - Pandas aggregation on timedelta and its behaviour -
i struggling aggregation on timedelta including plotting. raw data available here data has submit (datetime) , resolved (datetime) , pausetime (timedelta) , resolved-submit-pause ( actual time resolve )
test_df = pd.read_csv('test_df.csv') #convert date time stamps test_df[['submit','resolved']] = test_df[['submit','resolved']].apply(pd.to_datetime) #convert pausetime , resolved-submit-pausetime timedelta test_df['pausetime']=pd.to_timedelta(test_df['pausetime']) test_df['resolved-submit-pausetime'] = pd.to_timedelta(test_df['resolved-submit-pausetime']) i trying aggregate mean each day of 'resolved'
test_df.groupby([pd.grouper(key='resolved', freq='d')])['resolved-submit-pausetime'].mean() which gives me error - 'dataerror: no numeric types aggregate'
1) how can aggregate on mean .
2) guidance plotting trend of mean time resolve (x axis have dates , y axis agg mean timedelta of 'resolved-submit-pausetime')
use step convert time delta column seconds:
test_df['resolved-submit-pausetime'] = test_df['resolved-submit-pausetime'].astype('timedelta64[s]') 0 1234.0 1 27380.0 2 33017.0 3 5454.0 4 433.0 5 2302.0 6 21753.0 7 3405.0 8 4779.0 9 3974.0 10 3389.0 11 114.0 name: resolved-submit-pausetime, dtype: float64 then run groupby statement compute mean:
test_df.groupby([pd.grouper(key='resolved', freq='d')])['resolved-submit-pausetime'].mean() resolved 2017-04-01 20543.666667 2017-04-02 7485.500000 2017-04-03 3132.200000 name: resolved-submit-pausetime, dtype: float64 you can use pandas built in plotting tools quick , dirty plot of mean time respect groupby day:
test_df.groupby([pd.grouper(key='resolved', freq='d')])['resolved-submit-pausetime'].mean().plot() 
Comments
Post a Comment