This function estimates parameters for Random Forest based on bayesian optimization.
rf_opt(train_data, train_label, test_data, test_label, num_tree = 500L, mtry_range = c(1L, ncol(train_data) - 1), min_node_size_range = c(1L, as.integer(sqrt(nrow(train_data)))), init_points = 4, n_iter = 10, acq = "ei", kappa = 2.576, eps = 0, optkernel = list(type = "exponential", power = 2))
train_data | A data frame for training of Random Forest |
---|---|
train_label | The column of class to classify in the training data |
test_data | A data frame for training of xgboost |
test_label | The column of class to classify in the test data |
num_tree | The range of the number of trees for forest. Defaults to 500 (no optimization). |
mtry_range | Value of mtry used. Defaults from 1 to number of features. |
min_node_size_range | The range of minimum node sizes to best tested. Default min is 1 and max is sqrt(nrow(train_data)). |
init_points | Number of randomly chosen points to sample the target function before Bayesian Optimization fitting the Gaussian Process. |
n_iter | Total number of times the Bayesian Optimization is to repeated. |
acq | Acquisition function type to be used. Can be "ucb", "ei" or "poi".
|
kappa | tunable parameter kappa of GP Upper Confidence Bound, to balance exploitation against exploration, increasing kappa will make the optimized hyperparameters pursuing exploration. |
eps | tunable parameter epsilon of Expected Improvement and Probability of Improvement, to balance exploitation against exploration, increasing epsilon will make the optimized hyperparameters are more spread out across the whole range. |
optkernel | Kernel (aka correlation function) for the underlying Gaussian Process. This parameter should be a list that specifies the type of correlation function along with the smoothness parameter. Popular choices are square exponential (default) or matern 5/2 |
The test accuracy and a list of Bayesian Optimization result is returned:
Best_Par
a named vector of the best hyperparameter set found
Best_Value
the value of metrics achieved by the best hyperparameter set
History
a data.table
of the bayesian optimization history
Pred
a data.table
with validation/cross-validation prediction for each round of bayesian optimization history
library(MlBayesOpt) set.seed(71) res0 <- rf_opt(train_data = iris_train, train_label = Species, test_data = iris_test, test_label = Species, mtry_range = c(1L, ncol(iris_train) - 1), num_tree = 10L, init_points = 10, n_iter = 1)#> elapsed = 0.00 Round = 1 mtry_opt = 1.9988 min_node_size = 2.0000 Value = 0.9467 #> elapsed = 0.00 Round = 2 mtry_opt = 2.6653 min_node_size = 6.0000 Value = 0.9467 #> elapsed = 0.00 Round = 3 mtry_opt = 1.9821 min_node_size = 2.0000 Value = 0.9600 #> elapsed = 0.00 Round = 4 mtry_opt = 1.6350 min_node_size = 6.0000 Value = 0.9600 #> elapsed = 0.00 Round = 5 mtry_opt = 1.9484 min_node_size = 7.0000 Value = 0.9467 #> elapsed = 0.00 Round = 6 mtry_opt = 3.8418 min_node_size = 6.0000 Value = 0.9600 #> elapsed = 0.00 Round = 7 mtry_opt = 2.9851 min_node_size = 2.0000 Value = 0.9467 #> elapsed = 0.00 Round = 8 mtry_opt = 3.6683 min_node_size = 7.0000 Value = 0.9467 #> elapsed = 0.00 Round = 9 mtry_opt = 2.0140 min_node_size = 7.0000 Value = 0.9467 #> elapsed = 0.00 Round = 10 mtry_opt = 2.3043 min_node_size = 7.0000 Value = 0.9333 #> elapsed = 0.00 Round = 11 mtry_opt = 2.4541 min_node_size = 5.0000 Value = 0.9467 #> #> Best Parameters Found: #> Round = 3 mtry_opt = 1.9821 min_node_size = 2.0000 Value = 0.9600