python - Convert column with mixed text values and None to integer lists efficiently -
imagine have column values
data = pd.dataframe([['1,2,3'], ['4,5,6'], [none]])
i want output be:
[[1,2,3]], [[4,5,6]], [none]]
in other words, splitting comma-delimited strings lists while ignoring none values.
this function works fine apply
:
def parse_text_vector(s): if s none: return none else: return map(int, s.split(','))
as in example:
df = pd.dataframe([['1,2,3'], ['4,5,6'], [none]]) result = df[0].apply(parse_text_vector)
but across millions of rows, gets quite slow. hoping improve runtime doing along lines of
parse_text_vector(df.values)
, leads to:
in [61]: parse_text_vector(df.values) --------------------------------------------------------------------------- attributeerror traceback (most recent call last) <ipython-input-61-527d5f9f2b84> in <module>() ----> 1 parse_text_vector(df.values) <ipython-input-49-09dcd8f24ab3> in parse_text_vector(s) 4 return none 5 else: ----> 6 return map(int, s.split(',')) attributeerror: 'numpy.ndarray' object has no attribute 'split'
how can work? or otherwise optimize doesn't take tens of minutes process million-line dataframe?
use df.str.split
, convert list:
in [9]: df out[9]: col1 0 1,2,3 1 4,5,6 2 none in [10]: df.col1.str.split(',').tolist() out[10]: [['1', '2', '3'], ['4', '5', '6'], none]
to convert inner list elements integers, can conversion map
inside list-comprehension:
in [22]: [list(map(int, x)) if isinstance(x, list) else x x in df.col1.str.split(',').tolist()] out[22]: [[1, 2, 3], [4, 5, 6], none]
Comments
Post a Comment