Bayesian Optimization for XGboost (Cross Validation)
xgb_cv_opt(data, label, objectfun, evalmetric, n_folds, eta_range = c(0.1, 1L), max_depth_range = c(4L, 6L), nrounds_range = c(70, 160L), subsample_range = c(0.1, 1L), bytree_range = c(0.4, 1L), init_points = 4, n_iter = 10, acq = "ei", kappa = 2.576, eps = 0, optkernel = list(type = "exponential", power = 2), classes = NULL, seed = 0)
data | data |
---|---|
label | label for classification |
objectfun | Specify the learning task and the corresponding learning objective
|
evalmetric | evaluation metrics for validation data. Users can pass a self-defined function to it. Default: metric will be assigned according to objective(rmse for regression, and error for classification, mean average precision for ranking).
|
n_folds | K for cross Validation |
eta_range | The range of eta(default is c(0.1, 1L)) |
max_depth_range | The range of max_depth(default is c(4L, 6L)) |
nrounds_range | The range of nrounds(default is c(70, 160L)) |
subsample_range | The range of subsample rate(default is c(0.1, 1L)) |
bytree_range | The range of colsample_bytree rate(default is c(0.4, 1L) |
init_points | Number of randomly chosen points to sample the target function before Bayesian Optimization fitting the Gaussian Process. |
n_iter | Total number of times the Bayesian Optimization is to repeated. |
acq | Acquisition function type to be used. Can be "ucb", "ei" or "poi". #'
|
kappa | kappa tunable parameter kappa of GP Upper Confidence Bound, to balance exploitation against exploration, increasing kappa will make the optimized hyperparameters pursuing exploration. |
eps | tunable parameter epsilon of Expected Improvement and Probability of Improvement, to balance exploitation against exploration, increasing epsilon will make the optimized hyperparameters are more spread out across the whole range. |
optkernel | Kernel (aka correlation function) for the underlying Gaussian Process. This parameter should be a list that specifies the type of correlation function along with the smoothness parameter. Popular choices are square exponential (default) or matern 5/2 |
classes | set the number of classes. To use only with multiclass objectives. |
seed | set seed.(default is 0) |
The score you specified in the evalmetric option and a list of Bayesian Optimization result is returned:
Best_Par
a named vector of the best hyperparameter set found
Best_Value
the value of metrics achieved by the best hyperparameter set
History
a data.table
of the bayesian optimization history
Pred
a data.table
with validation/cross-validation prediction for each round of bayesian optimization history
library(MlBayesOpt) set.seed(71) res0 <- xgb_cv_opt(data = iris, label = Species, objectfun = "multi:softmax", evalmetric = "mlogloss", n_folds = 3, classes = 3, init_points = 2, n_iter = 1)#> elapsed = 0.01 Round = 1 eta_opt = 0.7235 max_depth_opt = 6.0000 nrounds_opt = 92.0318 subsample_opt = 0.1895 bytree_opt = 0.7112 Value = -0.2811 #> elapsed = 0.01 Round = 2 eta_opt = 0.5299 max_depth_opt = 5.0000 nrounds_opt = 76.3611 subsample_opt = 0.3846 bytree_opt = 0.7972 Value = -0.2032 #> elapsed = 0.01 Round = 3 eta_opt = 0.8745 max_depth_opt = 6.0000 nrounds_opt = 152.7652 subsample_opt = 0.5126 bytree_opt = 0.4380 Value = -0.6189 #> #> Best Parameters Found: #> Round = 2 eta_opt = 0.5299 max_depth_opt = 5.0000 nrounds_opt = 76.3611 subsample_opt = 0.3846 bytree_opt = 0.7972 Value = -0.2032