python - subselection of columns in dask (from pandas) by computed boolean indexer -

January 15, 2012

i'm new dask (imported dd) , try convert pandas (imported pd) code.

the goal of following lines, slice data columns, which's values fullfill calculated requirement in dask.

there given table in csv. former code reads

inputdata=pd.read_csv("inputfile.csv"); pseudoa=inputdata.quantile([.035,.965]) pseudob=pseudoa.diff().loc[.965] inputdata=inputdata.loc[:,inputdata.columns[pseudob.values>0]] inputdata.describe()

and working fine. simple idea conversion substitute first line to

inputdata=dd.read_csv("inputfile.csv");

but resulted in strange error message indexerror: many indices array. switching ready computed data in inputdata , pseudob error remains.
maybe question assigned idea of calculated boolean slicing dask-columns.

i found (maybe suboptimal) way (not solution) that. changing line 4 following

inputdata=inputdata.loc[:,inputdata.columns[(pseudob.values>0).compute()[0]]]

seems work.

yes, dask.dataframe's .loc accessor works if gets concrete indexing values. otherwise doesn't know partitions ask data. computing lazy dask result concrete pandas result 1 sensible solution problem, if indices fit in memory.

Search This Blog

How Y

python - subselection of columns in dask (from pandas) by computed boolean indexer -

Comments

Post a Comment

Popular posts from this blog

Is there a better way to structure post methods in Class Based Views -

reflection - How to access the object-members of an object declaration in kotlin -

php - Doctrine Query Builder Error on Join: [Syntax Error] line 0, col 87: Error: Expected Literal, got 'JOIN' -