python - Convert column with mixed text values and None to integer lists efficiently -
imagine have column values
data = pd.dataframe([['1,2,3'], ['4,5,6'], [none]])
i want output be:
[[1,2,3]], [[4,5,6]], [none]]
in other words, splitting comma-delimited strings lists while ignoring none values.
this function works fine apply:
def parse_text_vector(s):     if s none:         return none     else:         return map(int, s.split(',')) as in example:
df = pd.dataframe([['1,2,3'], ['4,5,6'], [none]]) result = df[0].apply(parse_text_vector) but across millions of rows, gets quite slow. hoping improve runtime doing along lines of
parse_text_vector(df.values), leads to:
in [61]: parse_text_vector(df.values) --------------------------------------------------------------------------- attributeerror                            traceback (most recent call last) <ipython-input-61-527d5f9f2b84> in <module>() ----> 1 parse_text_vector(df.values)  <ipython-input-49-09dcd8f24ab3> in parse_text_vector(s)       4         return none       5     else: ----> 6         return map(int, s.split(','))  attributeerror: 'numpy.ndarray' object has no attribute 'split' how can work? or otherwise optimize doesn't take tens of minutes process million-line dataframe?
use df.str.split , convert list:
in [9]: df out[9]:      col1 0  1,2,3 1  4,5,6 2   none  in [10]: df.col1.str.split(',').tolist() out[10]: [['1', '2', '3'], ['4', '5', '6'], none] to convert inner list elements integers, can conversion map inside list-comprehension:
in [22]: [list(map(int, x)) if isinstance(x, list) else x x in df.col1.str.split(',').tolist()] out[22]: [[1, 2, 3], [4, 5, 6], none] 
Comments
Post a Comment