| suggest_size {projpred} | R Documentation |
This function can suggest an appropriate submodel size based on a decision
rule described in section "Details" below. Note that this decision is quite
heuristic and should be interpreted with caution. It is recommended to
examine the results via plot.vsel() and/or summary.vsel() and to make the
final decision based on what is most appropriate for the problem at hand.
suggest_size(object, ...) ## S3 method for class 'vsel' suggest_size( object, stat = "elpd", pct = 0, type = "upper", warnings = TRUE, ... )
object |
An object of class |
... |
Arguments passed to |
stat |
Statistic used for the decision. See |
pct |
A number giving the relative proportion (not percents) between baseline model and null model utilities one is willing to sacrifice. See section "Details" below for more information. |
type |
Either |
warnings |
Mainly for internal use. A single logical value indicating
whether to throw warnings if automatic suggestion fails. Usually there is
no reason to set this to |
The suggested model size is the smallest model size for which either
the lower or upper bound (depending on argument type) of the
normal-approximation confidence interval (with nominal coverage 1 - alpha, see argument alpha of summary.vsel()) for u_k - u_base (with u_k denoting the k-th
submodel's utility and u_base denoting the baseline
model's utility) falls above (or is equal to)
pct * (u_0 - u_base)
where u_0 denotes the null
model utility. The baseline is either the reference model or the best
submodel found (see argument baseline of summary.vsel()).
For example, alpha = 0.32, pct = 0, and type = "upper" means that we
select the smallest model size for which the upper bound of the confidence
interval for u_k - u_base with coverage 68%
exceeds (or is equal to) zero, that is, for which the submodel's utility is
at most one standard error smaller than the baseline model's utility.
Loss statistics like the root mean-squared error (RMSE) and the
mean-squared error (MSE) are converted to utilities by multiplying them by
-1, so a call such as suggest_size(object, stat = "rmse", type = "upper") finds the smallest model size whose upper confidence interval
bound for the negative RMSE or MSE exceeds the cutoff (or, equivalently,
has the lower confidence interval bound for the RMSE or MSE below the
cutoff). This is done to make the interpretation of argument type the
same regardless of argument stat.
The intercept is not counted by suggest_size(), so a suggested size of
zero stands for the intercept-only model.
if (requireNamespace("rstanarm", quietly = TRUE)) {
# Data:
dat_gauss <- data.frame(y = df_gaussian$y, df_gaussian$x)
# The "stanreg" fit which will be used as the reference model (with small
# values for `chains` and `iter`, but only for technical reasons in this
# example; this is not recommended in general):
fit <- rstanarm::stan_glm(
y ~ X1 + X2 + X3 + X4 + X5, family = gaussian(), data = dat_gauss,
QR = TRUE, chains = 2, iter = 500, refresh = 0, seed = 9876
)
# Variable selection (here without cross-validation and with small values
# for `nterms_max`, `nclusters`, and `nclusters_pred`, but only for the
# sake of speed in this example; this is not recommended in general):
vs <- varsel(fit, nterms_max = 3, nclusters = 5, nclusters_pred = 10,
seed = 5555)
print(suggest_size(vs))
}