Correct use of map for mapping a function onto a df, python pandas -
searching awhile , can't concrete on this. looking best practice answer. code works, i'm not sure if i'm introducing problems.
# df['action'] = list(map(my_function, df.param1)) # works older # think? df['action'] = df['param1'].map(my_function)
both of these produce same visible result. i'm not entirely sure how first, commented out line works, example found on internets applied here , worked. other uses of map i've found 2nd line, called series object.
so first question, of these better practice , first 1 doing?
2nd , final question. more important of two. map, apply, applymap - not sure use here. first commented out line of code not work, while second gives me want.
def my_function(param1, param2, param3): return param1 * param2 * param3 # example # can't df.map function work? # error map not attribute of dataframe # df['new_col'] = df.map(my_function, df.param1, df.param1.shift(1), # df.param2.shift(1)) # typeerror: my_function takes 3 positional args, 4 given # df['new_col'] = df.apply(my_function, args=(df.param1, df.param1.shift(1), # df.param2.shift(1))) # works, not sure why df['new_col'] = list(map(my_function, df.param1, df.param1.shift(1), df.param2.shift(1)))
i'm trying compute result based off of 2 columns of df, current , previous rows. i've tried variations on map , apply when called df directly (df.map, df.apply) , haven't had success. if use list(map(...)) notation works great.
is list(map(...)) acceptable? best practice? there correct way use apply or map directly df object?
thanks guys, appreciated.
edit: maxu's response below works also. is, both of these work:
df['new_col'] = list(map(my_function, df.param1, df.param1.shift(1), df.param2.shift(1))) df['new_col'] = my_function(df.parma1, df.param1.shift(1), df.param2.shift(1)) # not work df['new_col'] = df.apply(my_function, axis=1, args=(df.param1, df.param1.shift(1), df.param2.shift(1))) # not work # attributeerror: ("'float' object has no attribute 'shift'", 'occurred @ index 2000-01-04 00:00:00') # work if remove shift(), not need. df['new_col'] = df.apply(lambda x: my_function(x.param1, x.param1.shift(1), x.param2.shift(1)))
i'm still unclear proper syntax use apply here, , if of these 3 methods superior other (i'm guessing list(map(...)) "worst" of 3 since iterates , isn't vectorized.
so first question, of these better practice , first 1 doing?
df['action'] = df['param1'].map(my_function)
is more idiomatic, faster (vectorized) , more reliable.
2nd , final question. more important of two. map, apply, applymap - not sure use here. first commented out line of code not work, while second gives me want.
pandas not have dataframe.map()
- series.map()
, if need access multiple columns in mapping function - can use dataframe.apply()
.
demo:
df['new_col'] = df.apply(lamba x: my_function(x.param1, x.param1.shift(1), x.param2.shift(1), axis=1)
or just:
df['new_col'] = my_function(df.param1, df.param1.shift(1), df.param2.shift(1))
Comments
Post a Comment