python - Missing samples of a dataframe in pandas -

April 15, 2010

my df:

in [163]: df.head() out[163]:                         x-axis    y-axis    z-axis time    2017-07-27 06:23:08 -0.107666 -0.068848  0.963623 2017-07-27 06:23:08 -0.105225 -0.070068  0.963867 .....

i set index datetime. since sampling rate (10 hz) not constant in dataframe , second have 8 or 9 samples.

i specify milliseconds on datatime (06:23:08**.100**, 06:23:08**.200**, etc.)
i interpolation of missing samples.

some ideas how in pandas?

first lets create sample data maybe resembles data.

import pandas pd datetime import timedelta datetime import datetime  base = datetime.now() date_list = [base - timedelta(days=x) x in range(0, 2)] values = [v v in range(2)] df = pd.dataframe.from_dict({'date': date_list, 'values': values})  df = df.set_index('date') df                             values date     2017-08-18 20:42:08.563878  0 2017-08-17 20:42:08.563878  1

now create data frame every 100 milliseconds of datapoint.

min_val = df.index.min() max_val = df.index.max()  all_val = [] while min_val <= max_val:     all_val.append(min_val)     min_val += timedelta(milliseconds=100) # len(all_val) 864001  df_new = pd.dataframe.from_dict({'date': all_val}) df_new = df_new.set_index('date')

lets join both data frame missing rows have index no values.

final_df = df_new.join(df) final_df                              values date     2017-08-17 20:42:08.563878  1.0 2017-08-17 20:42:08.663878  nan 2017-08-17 20:42:08.763878  nan 2017-08-17 20:42:08.863878  nan 2017-08-17 20:42:08.963878  nan 2017-08-17 20:42:09.063878  nan 2017-08-17 20:42:09.163878  nan

now interpolate data:

df_final.interpolate()                              values date     2017-08-17 20:42:08.563878  1.000000 2017-08-17 20:42:08.663878  0.999999 2017-08-17 20:42:08.763878  0.999998 2017-08-17 20:42:08.863878  0.999997 2017-08-17 20:42:08.963878  0.999995 2017-08-17 20:42:09.063878  0.999994 2017-08-17 20:42:09.163878  0.999993 2017-08-17 20:42:09.263878  0.999992

some interpolation strategies: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.dataframe.interpolate.html

update: per discussion in comments:

say our initial data not have millisecond information.

df_new_date_without_miliseconds = df_new['date'] df_new_date_without_miliseconds[0] # timestamp('2017-08-17 21:45:49')  max_value_date = df_new_date_without_miliseconds[0] max_value_miliseconds = df_new_date_without_miliseconds[0]  updated_dates = [] val in df_new_date_without_miliseconds:     if val == max_value_date:         val = max_value_miliseconds + timedelta(milliseconds=100)         max_value_miliseconds = val     elif val > max_value_date:         max_value_date = val + timedelta(milliseconds=0)         max_value_miliseconds = val     updated_dates.append(val)  output:  [timestamp('2017-08-17 21:45:49.100000'),  timestamp('2017-08-17 21:45:49.200000'),  timestamp('2017-08-17 21:45:49.300000'),  timestamp('2017-08-17 21:45:50'),  timestamp('2017-08-17 21:45:50.100000'),

assign new values dataframe

df_new['date'] = updated_dates

Search This Blog

How Y

python - Missing samples of a dataframe in pandas -

lets join both data frame missing rows have index no values.

Comments

Post a Comment

Popular posts from this blog

meteor - inserting data to database gives error "insert failed: Method '/texts/insert' not found" -

angular - DownloadURL return null in below code -

html - unterminated string literal “onclick” event in anchor -