This function estimates parameters for xgboost based on bayesian optimization.
xgb_opt(train_data, train_label, test_data, test_label, objectfun, evalmetric, eta_range = c(0.1, 1L), max_depth_range = c(4L, 6L), nrounds_range = c(70, 160L), subsample_range = c(0.1, 1L), bytree_range = c(0.4, 1L), init_points = 4, n_iter = 10, acq = "ei", kappa = 2.576, eps = 0, optkernel = list(type = "exponential", power = 2), classes = NULL)
train_data | A data frame for training of xgboost |
---|---|
train_label | The column of class to classify in the training data |
test_data | A data frame for training of xgboost |
test_label | The column of class to classify in the test data |
objectfun | Specify the learning task and the corresponding learning objective
|
evalmetric | evaluation metrics for validation data. Users can pass a self-defined function to it. Default: metric will be assigned according to objective(rmse for regression, and error for classification, mean average precision for ranking).
|
eta_range | The range of eta |
max_depth_range | The range of max_depth |
nrounds_range | The range of nrounds |
subsample_range | The range of subsample rate |
bytree_range | The range of colsample_bytree rate |
init_points | Number of randomly chosen points to sample the target function before Bayesian Optimization fitting the Gaussian Process. |
n_iter | Total number of times the Bayesian Optimization is to repeated. |
acq | Acquisition function type to be used. Can be "ucb", "ei" or "poi".
|
kappa | tunable parameter kappa of GP Upper Confidence Bound, to balance exploitation against exploration, increasing kappa will make the optimized hyperparameters pursuing exploration. |
eps | tunable parameter epsilon of Expected Improvement and Probability of Improvement, to balance exploitation against exploration, increasing epsilon will make the optimized hyperparameters are more spread out across the whole range. |
optkernel | Kernel (aka correlation function) for the underlying Gaussian Process. This parameter should be a list that specifies the type of correlation function along with the smoothness parameter. Popular choices are square exponential (default) or matern 5/2 |
classes | set the number of classes. To use only with multiclass objectives. |
The test accuracy and a list of Bayesian Optimization result is returned:
Best_Par
a named vector of the best hyperparameter set found
Best_Value
the value of metrics achieved by the best hyperparameter set
History
a data.table
of the bayesian optimization history
Pred
a data.table
with validation/cross-validation prediction for each round of bayesian optimization history
# NOT RUN { library(MlBayesOpt) set.seed(71) res0 <- xgb_opt(train_data = fashion_train, train_label = y, test_data = fashion_test, test_label = y, objectfun = "multi:softmax", evalmetric = "merror", classes = 10, init_points = 3, n_iter = 1) # }