python - Sklearn | LinearRegression | Fit -
i'm having few issues linearregression algorithm in scikit learn - have trawled through forums , googled lot, reason, haven't managed bypass error. using python 3.5
below i've attempted, keep getting value error:"found input variables inconsistent numbers of samples: [403, 174]"
x = df[["impressions", "clicks", "eligible_impressions", "measureable_impressions", "viewable_impressions"]].values y = df["total_conversions"].values.reshape(-1,1) print ("the shape of x {}".format(x.shape)) print ("the shape of y {}".format(y.shape)) shape of x (577, 5) shape of y (577, 1) x_train, y_train, x_test, y_test = train_test_split(x, y, test_size=0.3, random_state = 42) linreg = linearregression() linreg.fit(x_train, y_train) y_pred = linreg.predict(x_test) print (y_pred) print ("the shape of x_train {}".format(x_train.shape)) print ("the shape of y_train {}".format(y_train.shape)) print ("the shape of x_test {}".format(x_test.shape)) print ("the shape of y_test {}".format(y_test.shape)) shape of x_train (403, 5) shape of y_train (174, 5) shape of x_test (403, 1) shape of y_test (174, 1)
am missing glaringly obvious?
any appreciated.
kind regards, adrian
looks train , tests contain different number of rows x , y. , because you're storing return values of train_test_split() in incorrect order
change this
x_train, y_train, x_test, y_test = train_test_split(x, y, test_size=0.3, random_state = 42)
to this
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3, random_state = 42)
Comments
Post a Comment