1. Help Center
  2. Machine Learning

How to fine tune random forest model?

Hyperparameter Tuning Processes
There are various ways of performing hyperparameter tuning processes. After the base model has been created and evaluated, hyperparameters can be tuned to increase some specific metrics like accuracy or f1 score of the model.
 
One must check the overfitting and the bias variance errors before and after the adjustments. The model should be tuned according to the real time requirement. Sometimes an overfitting model might be very sensitive to the data fluctuation in validation, hence the cross validation scores with the cross validation deviation should be checked for possible overfit before and after model tuning.
 
1. Randomised Search CV
We can use scikit learn and RandomisedSearchCV where we can define the grid, the random forest model will be fitted over and over by randomly selecting parameters from the grid. We won’t get the best parameters, but we’ll definitely get the best model from the different models being fitted and tested.
 
Source Code:
 
from sklearn.model_selection import GridSearchCV
 
# Create a search grid of parameters that will be shuffled through
 
param_grid = {
 
‘bootstrap’: [True],
 
‘max_depth’: [10, 20, 30, 40, 50, 60, 70, 80, 90, 100, None],
 
‘max_features’: [‘auto’, ‘sqrt’],
 
‘min_samples_leaf’: [1, 2, 4],
 
‘min_samples_split’: [2, 5, 10],
 
‘n_estimators’: [200, 400, 600, 800, 1000, 1200, 1400, 1600, 1800, 2000]
 
}
 
# Using the random grid and searching for best hyperparameters
 
rf = RandomForestRegressor() #creating base model
 
rf_random = RandomizedSearchCV(estimator = rf, param_distributions = random_grid, n_iter = 100, cv = 5, verbose=2, random_state=42, n_jobs = -1)
 
rf_random.fit(train_features, train_labels) #fit is to initiate training process
 
The randomised search function will search the parameters through 5 fold cross validation and 100 iterations to end up with the best parameters.
 
2. Grid Search CV
Grid search is used after randomised search to narrow down the range to search the perfect hyperparameters. Now that we know where we can focus we can explicitly run those parameters through grid search and evaluate different models to get the final values for every hyperparameter.
 
Source Code:
 
from sklearn.model_selection import GridSearchCV
 
# Create the parameter grid based on the results of random search
 
param_grid = {
 
‘bootstrap’: [True],
 
‘max_depth’: [80, 90, 100, 110],
 
‘max_features’: [2, 3],
 
‘min_samples_leaf’: [3, 4, 5],
 
‘min_samples_split’: [8, 10, 12],
 
‘n_estimators’: [100, 200, 300, 1000]
 
}
 
# Create a based model
 
rf = RandomForestRegressor()
 
# Instantiate the grid search model
 
grid_search = GridSearchCV(estimator = rf, param_grid = param_grid,
 
cv = 3, n_jobs = -1, verbose = 2)
 
Results after execution:
 
# Fit the grid search to the data
 
grid_search.fit(train_features, train_labels)
 
grid_search.best_params_
 
{‘bootstrap’: True,
 
‘max_depth’: 80,
 
‘max_features’: 3,
 
‘min_samples_leaf’: 5,
 
‘min_samples_split’: 12,
 
 
‘n_estimators’: 100}
 
best_grid = grid_search.best_estimator_