Member-only story
Model Engineering — select and evaluate linear / non-linear models
From previous post, we can know that:
- Knowing that this dataset is non-linear, using linear regression will not return meaningful results. However, it is always good to have a baseline. Below linear regressions are tryout to give a baseline.
- Use non-linear regression algorithm MLP Regressor and Random Forest Regressor.
- Use GridSearchCV to split the dataset to 10 folds.
— — — — — —USE OF LINEAR MODELS— — — — — —
Use of Linear Regression, SGDRegressor and Support Vector Regression:
from sklearn.linear_model import LinearRegression, SGDRegressordef fit_model_cross_validate(algorithm, parameters, X_train, y_train, X_test, y_test):
gs = GridSearchCV(algorithm, parameters, cv=10, verbose=10)
gs.fit(X_train, y_train)
print(‘Best parameters:’, gs.best_params_)
print(‘Score on test set:’, gs.score(X_test, y_test))
# return both the model and the predictions
return gs, gs.predict(X_test)def evaluate_model(X_test, y_test):
print(‘MSE’, mean_squared_error(X_test, y_test))
def check_goodness_of_fit(model, X_train, y_train, X_test, y_test)…