Member-only story
Model Engineering — random forest parameter tuning
From last post, we can know that:
- Random forest algorithm can be used for regression problems
- It typically provides very high accuracy
But that is also the reason Random Forest can cause overfitting.
In this post, we will use GridSearchCV with Cross Validation to tune the parameters and improve the model performance.
# print out default parameters
from pprint import pprint
pprint(rand_model_pca.get_params())
Output:
{'cv': 10,
'error_score': nan,
'estimator': RandomForestRegressor(bootstrap=True, ccp_alpha=0.0, criterion='mse',
max_depth=None, max_features='auto', max_leaf_nodes=None,
max_samples=None, min_impurity_decrease=0.0,
min_impurity_split=None, min_samples_leaf=1,
min_samples_split=2, min_weight_fraction_leaf=0.0,
n_estimators=100, n_jobs=None, oob_score=False,
random_state=None, verbose=0, warm_start=False),
'estimator__bootstrap': True,
'estimator__ccp_alpha': 0.0,
'estimator__criterion': 'mse',
'estimator__max_depth': None,
'estimator__max_features': 'auto'…