This function estimates parameters for Random Forest based on bayesian optimization.

rf_opt(train_data, train_label, test_data, test_label, num_tree = 500L,
  mtry_range = c(1L, ncol(train_data) - 1), min_node_size_range = c(1L,
  as.integer(sqrt(nrow(train_data)))), init_points = 4, n_iter = 10,
  acq = "ei", kappa = 2.576, eps = 0, optkernel = list(type =
  "exponential", power = 2))

Arguments

train_data

A data frame for training of Random Forest

train_label

The column of class to classify in the training data

test_data

A data frame for training of xgboost

test_label

The column of class to classify in the test data

num_tree

The range of the number of trees for forest. Defaults to 500 (no optimization).

mtry_range

Value of mtry used. Defaults from 1 to number of features.

min_node_size_range

The range of minimum node sizes to best tested. Default min is 1 and max is sqrt(nrow(train_data)).

init_points

Number of randomly chosen points to sample the target function before Bayesian Optimization fitting the Gaussian Process.

n_iter

Total number of times the Bayesian Optimization is to repeated.

acq

Acquisition function type to be used. Can be "ucb", "ei" or "poi".

  • ucb GP Upper Confidence Bound

  • ei Expected Improvement

  • poi Probability of Improvement

kappa

tunable parameter kappa of GP Upper Confidence Bound, to balance exploitation against exploration, increasing kappa will make the optimized hyperparameters pursuing exploration.

eps

tunable parameter epsilon of Expected Improvement and Probability of Improvement, to balance exploitation against exploration, increasing epsilon will make the optimized hyperparameters are more spread out across the whole range.

optkernel

Kernel (aka correlation function) for the underlying Gaussian Process. This parameter should be a list that specifies the type of correlation function along with the smoothness parameter. Popular choices are square exponential (default) or matern 5/2

Value

The test accuracy and a list of Bayesian Optimization result is returned:

  • Best_Par a named vector of the best hyperparameter set found

  • Best_Value the value of metrics achieved by the best hyperparameter set

  • History a data.table of the bayesian optimization history

  • Pred a data.table with validation/cross-validation prediction for each round of bayesian optimization history

Examples

library(MlBayesOpt) set.seed(71) res0 <- rf_opt(train_data = iris_train, train_label = Species, test_data = iris_test, test_label = Species, mtry_range = c(1L, ncol(iris_train) - 1), num_tree = 10L, init_points = 10, n_iter = 1)
#> elapsed = 0.00 Round = 1 mtry_opt = 1.9988 min_node_size = 2.0000 Value = 0.9467 #> elapsed = 0.00 Round = 2 mtry_opt = 2.6653 min_node_size = 6.0000 Value = 0.9467 #> elapsed = 0.00 Round = 3 mtry_opt = 1.9821 min_node_size = 2.0000 Value = 0.9600 #> elapsed = 0.00 Round = 4 mtry_opt = 1.6350 min_node_size = 6.0000 Value = 0.9600 #> elapsed = 0.00 Round = 5 mtry_opt = 1.9484 min_node_size = 7.0000 Value = 0.9467 #> elapsed = 0.00 Round = 6 mtry_opt = 3.8418 min_node_size = 6.0000 Value = 0.9600 #> elapsed = 0.00 Round = 7 mtry_opt = 2.9851 min_node_size = 2.0000 Value = 0.9467 #> elapsed = 0.00 Round = 8 mtry_opt = 3.6683 min_node_size = 7.0000 Value = 0.9467 #> elapsed = 0.00 Round = 9 mtry_opt = 2.0140 min_node_size = 7.0000 Value = 0.9467 #> elapsed = 0.00 Round = 10 mtry_opt = 2.3043 min_node_size = 7.0000 Value = 0.9333 #> elapsed = 0.00 Round = 11 mtry_opt = 2.4541 min_node_size = 5.0000 Value = 0.9467 #> #> Best Parameters Found: #> Round = 3 mtry_opt = 1.9821 min_node_size = 2.0000 Value = 0.9600