python - Missing samples of a dataframe in pandas -


my df:

in [163]: df.head() out[163]:                         x-axis    y-axis    z-axis time    2017-07-27 06:23:08 -0.107666 -0.068848  0.963623 2017-07-27 06:23:08 -0.105225 -0.070068  0.963867 ..... 

i set index datetime. since sampling rate (10 hz) not constant in dataframe , second have 8 or 9 samples.

  1. i specify milliseconds on datatime (06:23:08**.100**, 06:23:08**.200**, etc.)
  2. i interpolation of missing samples.

some ideas how in pandas?

first lets create sample data maybe resembles data.

import pandas pd datetime import timedelta datetime import datetime  base = datetime.now() date_list = [base - timedelta(days=x) x in range(0, 2)] values = [v v in range(2)] df = pd.dataframe.from_dict({'date': date_list, 'values': values})  df = df.set_index('date') df                             values date     2017-08-18 20:42:08.563878  0 2017-08-17 20:42:08.563878  1 

now create data frame every 100 milliseconds of datapoint.

min_val = df.index.min() max_val = df.index.max()  all_val = [] while min_val <= max_val:     all_val.append(min_val)     min_val += timedelta(milliseconds=100) # len(all_val) 864001  df_new = pd.dataframe.from_dict({'date': all_val}) df_new = df_new.set_index('date') 

lets join both data frame missing rows have index no values.

final_df = df_new.join(df) final_df                              values date     2017-08-17 20:42:08.563878  1.0 2017-08-17 20:42:08.663878  nan 2017-08-17 20:42:08.763878  nan 2017-08-17 20:42:08.863878  nan 2017-08-17 20:42:08.963878  nan 2017-08-17 20:42:09.063878  nan 2017-08-17 20:42:09.163878  nan 

now interpolate data:

df_final.interpolate()                              values date     2017-08-17 20:42:08.563878  1.000000 2017-08-17 20:42:08.663878  0.999999 2017-08-17 20:42:08.763878  0.999998 2017-08-17 20:42:08.863878  0.999997 2017-08-17 20:42:08.963878  0.999995 2017-08-17 20:42:09.063878  0.999994 2017-08-17 20:42:09.163878  0.999993 2017-08-17 20:42:09.263878  0.999992 

some interpolation strategies: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.dataframe.interpolate.html

update: per discussion in comments:

say our initial data not have millisecond information.

df_new_date_without_miliseconds = df_new['date'] df_new_date_without_miliseconds[0] # timestamp('2017-08-17 21:45:49')  max_value_date = df_new_date_without_miliseconds[0] max_value_miliseconds = df_new_date_without_miliseconds[0]  updated_dates = [] val in df_new_date_without_miliseconds:     if val == max_value_date:         val = max_value_miliseconds + timedelta(milliseconds=100)         max_value_miliseconds = val     elif val > max_value_date:         max_value_date = val + timedelta(milliseconds=0)         max_value_miliseconds = val     updated_dates.append(val)  output:  [timestamp('2017-08-17 21:45:49.100000'),  timestamp('2017-08-17 21:45:49.200000'),  timestamp('2017-08-17 21:45:49.300000'),  timestamp('2017-08-17 21:45:50'),  timestamp('2017-08-17 21:45:50.100000'), 

assign new values dataframe

df_new['date'] = updated_dates 

Comments

Popular posts from this blog

Is there a better way to structure post methods in Class Based Views -

performance - Why is XCHG reg, reg a 3 micro-op instruction on modern Intel architectures? -

jquery - Responsive Navbar with Sub Navbar -