Bayesian Optimization for Random Forest

This function estimates parameters for Random Forest based on bayesian optimization.

rf_opt(train_data, train_label, test_data, test_label, num_tree = 500L,
  mtry_range = c(1L, ncol(train_data) - 1), min_node_size_range = c(1L,
  as.integer(sqrt(nrow(train_data)))), init_points = 4, n_iter = 10,
  acq = "ei", kappa = 2.576, eps = 0, optkernel = list(type =
  "exponential", power = 2))

Arguments

train_data	A data frame for training of Random Forest
train_label	The column of class to classify in the training data
test_data	A data frame for training of xgboost
test_label	The column of class to classify in the test data
num_tree	The range of the number of trees for forest. Defaults to 500 (no optimization).
mtry_range	Value of mtry used. Defaults from 1 to number of features.
min_node_size_range	The range of minimum node sizes to best tested. Default min is 1 and max is sqrt(nrow(train_data)).
init_points	Number of randomly chosen points to sample the target function before Bayesian Optimization fitting the Gaussian Process.
n_iter	Total number of times the Bayesian Optimization is to repeated.
acq	Acquisition function type to be used. Can be "ucb", "ei" or "poi". `ucb` GP Upper Confidence Bound `ei` Expected Improvement `poi` Probability of Improvement
kappa	tunable parameter kappa of GP Upper Confidence Bound, to balance exploitation against exploration, increasing kappa will make the optimized hyperparameters pursuing exploration.
eps	tunable parameter epsilon of Expected Improvement and Probability of Improvement, to balance exploitation against exploration, increasing epsilon will make the optimized hyperparameters are more spread out across the whole range.
optkernel	Kernel (aka correlation function) for the underlying Gaussian Process. This parameter should be a list that specifies the type of correlation function along with the smoothness parameter. Popular choices are square exponential (default) or matern 5/2

Value

The test accuracy and a list of Bayesian Optimization result is returned:

Best_Par a named vector of the best hyperparameter set found
Best_Value the value of metrics achieved by the best hyperparameter set
History a data.table of the bayesian optimization history
Pred a data.table with validation/cross-validation prediction for each round of bayesian optimization history

Examples

library(MlBayesOpt)

set.seed(71)
res0 <- rf_opt(train_data = iris_train,
               train_label = Species,
               test_data = iris_test,
               test_label = Species,
               mtry_range = c(1L, ncol(iris_train) - 1),
               num_tree = 10L,
               init_points = 10,
               n_iter = 1)
#> elapsed = 0.00	Round = 1	mtry_opt = 1.9988	min_node_size = 2.0000	Value = 0.9467 
#> elapsed = 0.00	Round = 2	mtry_opt = 2.6653	min_node_size = 6.0000	Value = 0.9467 
#> elapsed = 0.00	Round = 3	mtry_opt = 1.9821	min_node_size = 2.0000	Value = 0.9600 
#> elapsed = 0.00	Round = 4	mtry_opt = 1.6350	min_node_size = 6.0000	Value = 0.9600 
#> elapsed = 0.00	Round = 5	mtry_opt = 1.9484	min_node_size = 7.0000	Value = 0.9467 
#> elapsed = 0.00	Round = 6	mtry_opt = 3.8418	min_node_size = 6.0000	Value = 0.9600 
#> elapsed = 0.00	Round = 7	mtry_opt = 2.9851	min_node_size = 2.0000	Value = 0.9467 
#> elapsed = 0.00	Round = 8	mtry_opt = 3.6683	min_node_size = 7.0000	Value = 0.9467 
#> elapsed = 0.00	Round = 9	mtry_opt = 2.0140	min_node_size = 7.0000	Value = 0.9467 
#> elapsed = 0.00	Round = 10	mtry_opt = 2.3043	min_node_size = 7.0000	Value = 0.9333 
#> elapsed = 0.00	Round = 11	mtry_opt = 2.4541	min_node_size = 5.0000	Value = 0.9467 
#> 
#>  Best Parameters Found: 
#> Round = 3	mtry_opt = 1.9821	min_node_size = 2.0000	Value = 0.9600

Bayesian Optimization for Random Forest

Arguments

Value

Examples

Contents