Correct use of map for mapping a function onto a df, python pandas -

June 15, 2015

searching awhile , can't concrete on this. looking best practice answer. code works, i'm not sure if i'm introducing problems.

# df['action'] = list(map(my_function, df.param1)) # works older      # think? df['action'] = df['param1'].map(my_function)

both of these produce same visible result. i'm not entirely sure how first, commented out line works, example found on internets applied here , worked. other uses of map i've found 2nd line, called series object.

so first question, of these better practice , first 1 doing?

2nd , final question. more important of two. map, apply, applymap - not sure use here. first commented out line of code not work, while second gives me want.

def my_function(param1, param2, param3):     return param1 * param2 * param3 # example  # can't df.map function work? # error map not attribute of dataframe # df['new_col'] = df.map(my_function, df.param1, df.param1.shift(1),  #    df.param2.shift(1))  # typeerror: my_function takes 3 positional args, 4 given # df['new_col'] = df.apply(my_function, args=(df.param1, df.param1.shift(1),  #    df.param2.shift(1)))  # works, not sure why df['new_col'] = list(map(my_function, df.param1, df.param1.shift(1),       df.param2.shift(1)))

i'm trying compute result based off of 2 columns of df, current , previous rows. i've tried variations on map , apply when called df directly (df.map, df.apply) , haven't had success. if use list(map(...)) notation works great.

is list(map(...)) acceptable? best practice? there correct way use apply or map directly df object?

thanks guys, appreciated.

edit: maxu's response below works also. is, both of these work:

df['new_col'] = list(map(my_function, df.param1, df.param1.shift(1),          df.param2.shift(1))) df['new_col'] = my_function(df.parma1, df.param1.shift(1), df.param2.shift(1))  # not work df['new_col'] = df.apply(my_function, axis=1, args=(df.param1,          df.param1.shift(1), df.param2.shift(1))) # not work # attributeerror: ("'float' object has no attribute 'shift'",      'occurred @ index 2000-01-04 00:00:00') # work if remove shift(), not need. df['new_col'] = df.apply(lambda x: my_function(x.param1, x.param1.shift(1),     x.param2.shift(1)))

i'm still unclear proper syntax use apply here, , if of these 3 methods superior other (i'm guessing list(map(...)) "worst" of 3 since iterates , isn't vectorized.

so first question, of these better practice , first 1 doing?

df['action'] = df['param1'].map(my_function)

is more idiomatic, faster (vectorized) , more reliable.

2nd , final question. more important of two. map, apply, applymap - not sure use here. first commented out line of code not work, while second gives me want.

pandas not have dataframe.map() - series.map(), if need access multiple columns in mapping function - can use dataframe.apply().

demo:

df['new_col'] = df.apply(lamba x: my_function(x.param1,                                               x.param1.shift(1),                                               x.param2.shift(1),                          axis=1)

or just:

df['new_col'] = my_function(df.param1, df.param1.shift(1), df.param2.shift(1))

Search This Blog

How Y

Correct use of map for mapping a function onto a df, python pandas -

Comments

Post a Comment

Popular posts from this blog

Is there a better way to structure post methods in Class Based Views -

reflection - How to access the object-members of an object declaration in kotlin -

php - Doctrine Query Builder Error on Join: [Syntax Error] line 0, col 87: Error: Expected Literal, got 'JOIN' -