Bayesian Optimization for XGboost(Cross Validation)

Bayesian Optimization for XGboost (Cross Validation)

xgb_cv_opt(data, label, objectfun, evalmetric, n_folds, eta_range = c(0.1,
  1L), max_depth_range = c(4L, 6L), nrounds_range = c(70, 160L),
  subsample_range = c(0.1, 1L), bytree_range = c(0.4, 1L),
  init_points = 4, n_iter = 10, acq = "ei", kappa = 2.576, eps = 0,
  optkernel = list(type = "exponential", power = 2), classes = NULL,
  seed = 0)

Arguments

data	data
label	label for classification
objectfun	Specify the learning task and the corresponding learning objective `reg:linear` linear regression (Default). `reg:logistic` logistic regression. `binary:logistic` logistic regression for binary classification. Output probability. `binary:logitraw` logistic regression for binary classification, output score before logistic transformation. `multi:softmax` set xgboost to do multiclass classification using the softmax objective. Class is represented by a number and should be from 0 to `num_class - 1`. `multi:softprob` same as softmax, but prediction outputs a vector of ndata * nclass elements, which can be further reshaped to ndata, nclass matrix. The result contains predicted probabilities of each data point belonging to each class. `rank:pairwise` set xgboost to do ranking task by minimizing the pairwise loss.
evalmetric	evaluation metrics for validation data. Users can pass a self-defined function to it. Default: metric will be assigned according to objective(rmse for regression, and error for classification, mean average precision for ranking). `error` binary classification error rate `rmse` Rooted mean square error `logloss` negative log-likelihood function `auc` Area under curve `merror` Exact matching error, used to evaluate multi-class classification
n_folds	K for cross Validation
eta_range	The range of eta(default is c(0.1, 1L))
max_depth_range	The range of max_depth(default is c(4L, 6L))
nrounds_range	The range of nrounds(default is c(70, 160L))
subsample_range	The range of subsample rate(default is c(0.1, 1L))
bytree_range	The range of colsample_bytree rate(default is c(0.4, 1L)
init_points	Number of randomly chosen points to sample the target function before Bayesian Optimization fitting the Gaussian Process.
n_iter	Total number of times the Bayesian Optimization is to repeated.
acq	Acquisition function type to be used. Can be "ucb", "ei" or "poi". #' `ucb` GP Upper Confidence Bound `ei` Expected Improvement `poi` Probability of Improvement
kappa	kappa tunable parameter kappa of GP Upper Confidence Bound, to balance exploitation against exploration, increasing kappa will make the optimized hyperparameters pursuing exploration.
eps	tunable parameter epsilon of Expected Improvement and Probability of Improvement, to balance exploitation against exploration, increasing epsilon will make the optimized hyperparameters are more spread out across the whole range.
optkernel	Kernel (aka correlation function) for the underlying Gaussian Process. This parameter should be a list that specifies the type of correlation function along with the smoothness parameter. Popular choices are square exponential (default) or matern 5/2
classes	set the number of classes. To use only with multiclass objectives.
seed	set seed.(default is 0)

Value

The score you specified in the evalmetric option and a list of Bayesian Optimization result is returned:

Best_Par a named vector of the best hyperparameter set found
Best_Value the value of metrics achieved by the best hyperparameter set
History a data.table of the bayesian optimization history
Pred a data.table with validation/cross-validation prediction for each round of bayesian optimization history

Examples

library(MlBayesOpt)

set.seed(71)
res0 <- xgb_cv_opt(data = iris,
                   label = Species,
                   objectfun = "multi:softmax",
                   evalmetric = "mlogloss",
                   n_folds = 3,
                   classes = 3,
                   init_points = 2,
                   n_iter = 1)
#> elapsed = 0.01	Round = 1	eta_opt = 0.7235	max_depth_opt = 6.0000	nrounds_opt = 92.0318	subsample_opt = 0.1895	bytree_opt = 0.7112	Value = -0.2811 
#> elapsed = 0.01	Round = 2	eta_opt = 0.5299	max_depth_opt = 5.0000	nrounds_opt = 76.3611	subsample_opt = 0.3846	bytree_opt = 0.7972	Value = -0.2032 
#> elapsed = 0.01	Round = 3	eta_opt = 0.8745	max_depth_opt = 6.0000	nrounds_opt = 152.7652	subsample_opt = 0.5126	bytree_opt = 0.4380	Value = -0.6189 
#> 
#>  Best Parameters Found: 
#> Round = 2	eta_opt = 0.5299	max_depth_opt = 5.0000	nrounds_opt = 76.3611	subsample_opt = 0.3846	bytree_opt = 0.7972	Value = -0.2032

Bayesian Optimization for XGboost(Cross Validation)

Arguments

Value

Examples

Contents