Python SciKitLearn and Pandas categoric data -


i'm working on multivariable regression csv, predicting crop performance based on multiple factors. of columns numerical , meaningful. others numerical , categorical, or strings , categorical (for instance, crop variety, or plot code or whatever.) how teach python use them? i've found 1 hot encoder (http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.onehotencoder.html#sklearn.preprocessing.onehotencoder) don't understand how apply here.

my code far:

import pandas pd import statsmodels.api sm sklearn.preprocessing import standardscaler df = pd.read_csv('filepath.csv')  df.drop(df[df['labeleddatacolumn'].isnull()].index.tolist(),inplace=true)  scale = standardscaler()  pd.options.mode.chained_assignment = none  # default='warn' x = df[['inputcolumn1', 'inputcolumn2', ...,'inputcolumn20']] y = df['labeleddatacolumn']  x[['inputcolumn1', 'inputcolumn2', ...,'inputcolumn20']] = scale.fit_transform(x[['inputcolumn1', 'inputcolumn2', ...,'inputcolumn20']].as_matrix())  #print (x)  est = sm.ols(y, x).fit()  est.summary() 

you use get_dummies function pandas provides , convert categorical values.

something this..

predictor = pd.concat([data.get(['numerical_column_1','numerical_column_2','label']),                            pd.get_dummies(data['categorical_column1'], prefix='categorical_col1'),                            pd.get_dummies(data['categorical_column2'], prefix='categorical_col2'),                           axis=1) 

then outcome/label column doing

outcome = predictor['label'] del predictor['label'] 

then call model on data doing

est = sm.ols(outcome, predictor).fit() 

Comments

Popular posts from this blog

Is there a better way to structure post methods in Class Based Views -

performance - Why is XCHG reg, reg a 3 micro-op instruction on modern Intel architectures? -

jquery - Responsive Navbar with Sub Navbar -