| refmodel-init-get {projpred} | R Documentation |
Function get_refmodel() is a generic function for creating the reference
model structure from a specific object. The methods for get_refmodel()
usually call init_refmodel() which is the underlying workhorse (and may
also be used directly without a call to get_refmodel()). Some arguments are
for K-fold cross-validation (K-fold CV) only; see cv_varsel()
for the use of K-fold CV in projpred.
get_refmodel(object, ...) ## S3 method for class 'refmodel' get_refmodel(object, ...) ## S3 method for class 'vsel' get_refmodel(object, ...) ## Default S3 method: get_refmodel(object, formula, family = NULL, ...) ## S3 method for class 'stanreg' get_refmodel(object, ...) init_refmodel( object, data, formula, family, ref_predfun = NULL, div_minimizer = NULL, proj_predfun = NULL, extract_model_data, cvfun = NULL, cvfits = NULL, dis = NULL, cvrefbuilder = NULL, ... )
object |
Object from which the reference model is created. For
|
... |
For |
formula |
Reference model's formula. For general information on formulas
in R, see |
family |
A |
data |
Data used for fitting the reference model. Any |
ref_predfun |
Prediction function for the linear predictor of the
reference model, including offsets (if existing). See also section
"Arguments |
div_minimizer |
A function for minimizing the Kullback-Leibler (KL)
divergence from a submodel to the reference model (i.e., for performing the
projection of the reference model onto a submodel). The output of
|
proj_predfun |
Prediction function for the linear predictor of a
submodel onto which the reference model is projected. See also section
"Arguments |
extract_model_data |
A function for fetching some variables (response,
observation weights, offsets) from the original dataset (i.e., the dataset
used for fitting the reference model) or from a new dataset. See also
section "Argument |
cvfun |
For K-fold CV only. A function that, given a fold indices
vector, fits the reference model separately for each fold and returns the
K model fits as a |
cvfits |
For K-fold CV only. A |
dis |
A vector of posterior draws for the dispersion parameter (if
existing). May be |
cvrefbuilder |
For K-fold CV only. A function that, given a
reference model fit for fold k = 1, ..., K (this
model fit is the k-th element of the return value of |
An object that can be passed to all the functions that take the
reference model fit as the first argument, such as varsel(),
cv_varsel(), project(), proj_linpred(), and proj_predict().
Usually, the returned object is of class refmodel. However, if object
is NULL, the returned object is of class datafit as well as of class
refmodel (with datafit being first). Objects of class datafit are
handled differently at several places throughout this package.
For additive models (still an experimental feature), only mgcv::s() and
mgcv::t2() are currently supported as smooth terms. Furthermore, these need
to be called without any arguments apart from the predictor names (symbols).
For example, for smoothing the effect of a predictor x, only s(x) or
t2(x) are allowed. As another example, for smoothing the joint effect of
two predictors x and z, only s(x, z) or t2(x, z) are allowed (and
analogously for higher-order joint effects, e.g., of three predictors).
ref_predfun, proj_predfun, and div_minimizerArguments ref_predfun, proj_predfun, and div_minimizer may be NULL
for using an internal default. Otherwise, let N denote the number of
observations (in case of CV, these may be reduced to each fold),
S_ref the number of posterior draws for the reference
model's parameters, and S_prj the number of (possibly
clustered) parameter draws for projection (short: the number of projected
draws). Then the functions supplied to these arguments need to have the
following prototypes:
ref_predfun: ref_predfun(fit, newdata = NULL) where:
fit accepts the reference model fit as given in argument object
(but possibly re-fitted to a subset of the observations, as done in
K-fold CV).
newdata accepts either NULL (for using the original dataset,
typically stored in fit) or data for new observations (at least in the
form of a data.frame).
proj_predfun: proj_predfun(fits, newdata) where:
fits accepts a list of length S_prj
containing this number of submodel fits. This list is the same as that
returned by project() in its output element submodl (which in turn is
the same as the return value of div_minimizer, except if project()
was used with an object of class vsel based on an L1 search as well
as with refit_prj = FALSE).
newdata accepts data for new observations (at least in the form of a
data.frame).
div_minimizer does not need to have a specific prototype, but it needs to
be able to be called with the following arguments:
formula accepts either a standard formula with a single response
(if S_prj = 1) or a formula with
S_prj > 1 response variables cbind()-ed on
the left-hand side in which case the projection has to be performed for
each of the response variables separately.
data accepts a data.frame to be used for the projection.
family accepts a family object.
weights accepts either observation weights (at least in the form of a
numeric vector) or NULL (for using a vector of ones as weights).
projpred_var accepts an N x S_prj
matrix of predictive variances (necessary for projpred's internal
GLM fitter).
projpred_regul accepts a single numeric value as supplied to argument
regul of project(), for example.
... accepts further arguments specified by the user.
The return value of these functions needs to be:
ref_predfun: an N x S_ref matrix.
proj_predfun: an N x S_prj matrix.
div_minimizer: a list of length S_prj containing
this number of submodel fits.
extract_model_dataThe function supplied to argument extract_model_data needs to have the
prototype
extract_model_data(object, newdata, wrhs = NULL, orhs = NULL, extract_y = TRUE)
where:
object accepts the reference model fit as given in argument object (but
possibly re-fitted to a subset of the observations, as done in K-fold
CV).
newdata accepts either NULL (for using the original dataset, typically
stored in object) or data for new observations (at least in the form of a
data.frame).
wrhs accepts at least either NULL (for using a vector of ones) or a
right-hand side formula consisting only of the variable in newdata
containing the weights.
orhs accepts at least either NULL (for using a vector of zeros) or a
right-hand side formula consisting only of the variable in newdata
containing the offsets.
extract_y accepts a single logical value indicating whether output
element y (see below) shall be NULL (TRUE) or not (FALSE).
The return value of extract_model_data needs to be a list with elements
y, weights, and offset, each being a numeric vector containing the data
for the response, the observation weights, and the offsets, respectively. An
exception is that y may also be NULL (depending on argument extract_y).
if (requireNamespace("rstanarm", quietly = TRUE)) {
# Data:
dat_gauss <- data.frame(y = df_gaussian$y, df_gaussian$x)
# The "stanreg" fit which will be used as the reference model (with small
# values for `chains` and `iter`, but only for technical reasons in this
# example; this is not recommended in general):
fit <- rstanarm::stan_glm(
y ~ X1 + X2 + X3 + X4 + X5, family = gaussian(), data = dat_gauss,
QR = TRUE, chains = 2, iter = 500, refresh = 0, seed = 9876
)
# Define the reference model explicitly:
ref <- get_refmodel(fit)
print(class(ref)) # gives `"refmodel"`
# Now see, for example, `?varsel`, `?cv_varsel`, and `?project` for
# possible post-processing functions. Most of the post-processing functions
# call get_refmodel() internally at the beginning, so you will rarely need
# to call get_refmodel() yourself.
# A custom reference model which may be used in a variable selection where
# the candidate predictors are not a subset of those used for the reference
# model's predictions:
ref_cust <- init_refmodel(
fit,
data = dat_gauss,
formula = y ~ X6 + X7,
family = gaussian(),
extract_model_data = function(object, newdata = NULL, wrhs = NULL,
orhs = NULL, extract_y = TRUE) {
if (!extract_y) {
resp_form <- NULL
} else {
resp_form <- ~ y
}
if (is.null(newdata)) {
newdata <- dat_gauss
}
args <- projpred:::nlist(object, newdata, wrhs, orhs, resp_form)
return(projpred::do_call(projpred:::.extract_model_data, args))
},
cvfun = function(folds) {
kfold(
fit, K = max(folds), save_fits = TRUE, folds = folds, cores = 1
)$fits[, "fit"]
},
dis = as.matrix(fit)[, "sigma"]
)
# Now, the post-processing functions mentioned above (for example,
# varsel(), cv_varsel(), and project()) may be applied to `ref_cust`.
}