python - Convert column with mixed text values and None to integer lists efficiently -


imagine have column values

data = pd.dataframe([['1,2,3'], ['4,5,6'], [none]])

i want output be:

[[1,2,3]], [[4,5,6]], [none]]

in other words, splitting comma-delimited strings lists while ignoring none values.

this function works fine apply:

def parse_text_vector(s):     if s none:         return none     else:         return map(int, s.split(',')) 

as in example:

df = pd.dataframe([['1,2,3'], ['4,5,6'], [none]]) result = df[0].apply(parse_text_vector) 

but across millions of rows, gets quite slow. hoping improve runtime doing along lines of

parse_text_vector(df.values), leads to:

in [61]: parse_text_vector(df.values) --------------------------------------------------------------------------- attributeerror                            traceback (most recent call last) <ipython-input-61-527d5f9f2b84> in <module>() ----> 1 parse_text_vector(df.values)  <ipython-input-49-09dcd8f24ab3> in parse_text_vector(s)       4         return none       5     else: ----> 6         return map(int, s.split(','))  attributeerror: 'numpy.ndarray' object has no attribute 'split' 

how can work? or otherwise optimize doesn't take tens of minutes process million-line dataframe?

use df.str.split , convert list:

in [9]: df out[9]:      col1 0  1,2,3 1  4,5,6 2   none  in [10]: df.col1.str.split(',').tolist() out[10]: [['1', '2', '3'], ['4', '5', '6'], none] 

to convert inner list elements integers, can conversion map inside list-comprehension:

in [22]: [list(map(int, x)) if isinstance(x, list) else x x in df.col1.str.split(',').tolist()] out[22]: [[1, 2, 3], [4, 5, 6], none] 

Comments

Popular posts from this blog

Is there a better way to structure post methods in Class Based Views -

performance - Why is XCHG reg, reg a 3 micro-op instruction on modern Intel architectures? -

jquery - Responsive Navbar with Sub Navbar -