Using group by function in python/pandas dataframe -


i have data frame in python. data looks below.

id  time        test    count 1   01:25.5     1105        1 2   02:25.9     1105        0 3   03:25.5     1105        1 4   04:25.5     1105        1 5   05:25.9     1105        1 6   06:25.5     1105        0 7   07:25.9     1105        1 8   08:25.6     1105        1 9   09:25.9     1106        0 10  10:25.6     1105        1 11  11:26.0     1105        1 12  12:25.6     1105        1 13  14:22.0     1105        0 14  14:25.6     1106        1 15  15:26.0     1105        1 16  16:25.6     1105        0 17  17:22.0     1105        1 18  18:25.7     1105        1 19  19:26.0     1105        1 20  20:25.7     1105        0 21  21:25.1     1105        1 22  22:25.7     1106        1 23  22:33.7     1107        0 24  24:25.7     1105        0  25  25:26.1     1105        0 26  27:25.7     1105        1 27  22:35.7     1106        0 

now want group records on conditions.

1) if in 3 minute window there 4 or more 4 records particular test, group records , count number of records , update count value. time grouped record should time of last record has been grouped.

2) if there less 4 records particulat test in given 3 minute window leave records are.

expected output:

id  time        test    count 1   04:25.5     1105    04 2   08:25.6     1105    04 3   10:25.6     1105    1 4   11:26.0     1105    1 5   12:25.6     1105    1 6   17:22.0     1105    04 7   21:25.1     1105    04 8   09:25.9     1106    0 9   14:25.6     1106    1 10  22:25.7     1106    1 11  22:33.7     1107    0 12  24:25.7     1105    0  13  25:26.1     1105    0 14  27:25.7     1105    1 15  22:35.7     1106    0     

explanation output:

1) 1st record group of 1-4 records of input , count of records , time last time of group.

2) 2nd record group of 4-8 records of input , count of records , time last time of group

3) 3,4, 5 records because particular test 1105 in given 3 minute window 10:25 13:25 there 3 records not minimum 4 records them grouped.

4) 6th record group of 13-17 records of input except record 14 , count of records , time last time of group.

5) 7th record group of 18-21 records of input , count of records , time last time of group

6) records 9-15 records because particular test 1105/1106/1107 in given 3 minute window there not minimum 4 records.

how can achieve result?

okay can't think of "pandas" way this. because 3 minute window relative , not absolute (i.e. not 0-3 minutes, 4-6 minutes, etc.) don't see how can use resample() or rolling() type functionality available.

i'm not sure elegant way accomplish wanting. there main groupby().apply() function, function collapse >4 tests 1 row, , function create 3 minute filter. rest of code cleaning columns , indexes output wish. assumes starting dataframe called df , give output dataframe dfg

def x_min_filter(df, start_idx, mins=3):     '''     returns filter starts @ df[start_index].time , ends @      df[start_index].time + mins      parameters     ----------     df : pandas.dataframe         dataframe filter      start_index : int         df index number start @      mins : int, default=3         number of minutes of filtering window (start + mins = total window time)     '''     plus_x_min = df[start_idx:].iloc[0].time + pd.timedelta('{} minutes'.format(mins))     time_fil = ((df.time >= df[start_idx:].iloc[0].time) & (df.time <= plus_x_min))      return time_fil  def collapse_gte_x(df, number_of_tests = 4):     '''     collapse dataframe 1 row if number of     entries exceeds parameter number_of_tests otherwise     return dataframe      parameters     ----------     df : pandas.dataframe         dataframe collapse or return      number_of_tests : int, default=4         minimum number of tests required collapse dataframe         1 entry     '''     test_count = len(df.index)      if test_count >= number_of_tests:         df.loc[df.iloc[-1].name, 'count'] = test_count         return df.tail(1)     else:         return df  def grp_x_mins(grp):     '''     return dataframe custom grouping     intended used groupby.apply     '''     dfs = [] # list hold list of collapsed (or not) dataframes combined     next_index = 0      # clean dataframe     grp = grp.reset_index().drop(['index', 'test', 'id'], axis=1)      # loop through tests in x minute chunks     while next_index < len(grp.index):             # use plus x minutes filter         pxm_df = grp[x_min_filter(grp, next_index)]          # add copy of collapsed dataframe         dfs.append(collapse_gte_x(pxm_df).copy())          # next entry in sent group         next_index = pxm_df.iloc[-1].name + 1      return pd.concat(dfs)  dfg = df.groupby('test').apply(grp_x_mins)  # clean dataframe dfg = dfg.reset_index(level=1, drop=true).reset_index() # renumber id desired dfg['id'] = dfg.index + 1 # drop date datetime, datetime required indexing dfg['time'] = dfg.time.dt.time  dfg 

i output

    test    time          count id 0   1105    00:04:25.500000 4   1 1   1105    00:08:25.600000 4   2 2   1105    00:10:25.600000 1   3 3   1105    00:11:26        1   4 4   1105    00:12:25.600000 1   5 5   1105    00:17:22        4   6 6   1105    00:21:25.100000 4   7 7   1105    00:24:25.700000 0   8 8   1105    00:25:26.100000 0   9 9   1105    00:27:25.700000 1   10 10  1106    00:09:25.900000 0   11 11  1106    00:14:25.600000 1   12 12  1106    00:22:25.700000 1   13 13  1106    00:22:35.700000 0   14 14  1107    00:22:33.700000 0   15 

Comments

Popular posts from this blog

Is there a better way to structure post methods in Class Based Views -

performance - Why is XCHG reg, reg a 3 micro-op instruction on modern Intel architectures? -

jquery - Responsive Navbar with Sub Navbar -