| diagnosticPlot {robustHD} | R Documentation |
Produce diagnostic plots for a sequence of regression models, such as submodels along a robust least angle regression sequence, or sparse least trimmed squares regression models for a grid of values for the penalty parameter. Four plots are currently implemented.
diagnosticPlot(object, ...)
## S3 method for class 'seqModel'
diagnosticPlot(object, s = NA, covArgs = list(), ...)
## S3 method for class 'perrySeqModel'
diagnosticPlot(object, covArgs = list(), ...)
## S3 method for class 'tslars'
diagnosticPlot(object, p, s = NA, covArgs = list(), ...)
## S3 method for class 'sparseLTS'
diagnosticPlot(
object,
s = NA,
fit = c("reweighted", "raw", "both"),
covArgs = list(),
...
)
## S3 method for class 'perrySparseLTS'
diagnosticPlot(
object,
fit = c("reweighted", "raw", "both"),
covArgs = list(),
...
)
## S3 method for class 'setupDiagnosticPlot'
diagnosticPlot(
object,
which = c("all", "rqq", "rindex", "rfit", "rdiag"),
ask = (which == "all"),
facets = object$facets,
size = c(2, 4),
id.n = NULL,
...
)
object |
the model fit for which to produce diagnostic plots, or an
object containing all necessary information for plotting (as generated
by |
... |
additional arguments to be passed down, eventually to
|
s |
for the |
covArgs |
a list of arguments to be passed to
|
p |
an integer giving the lag length for which to produce the plot (the default is to use the optimal lag length). |
fit |
a character string specifying for which fit to produce
diagnostic plots. Possible values are |
which |
a character string indicating which plot to show. Possible
values are |
ask |
a logical indicating whether the user should be asked before
each plot (see |
facets |
a faceting formula to override the default behavior. If
supplied, |
size |
a numeric vector of length two giving the point and label size, respectively. |
id.n |
an integer giving the number of the most extreme observations to be identified by a label. The default is to use the number of identified outliers, which can be different for the different plots. See “Details” for more information. |
In the normal Q-Q plot of the standardized residuals, a reference line is
drawn through the first and third quartile. The id.n observations
with the largest distances from that line are identified by a label (the
observation number). The default for id.n is the number of
regression outliers, i.e., the number of observations whose residuals are
too large (cf. weights).
In the plots of the standardized residuals versus their index or the fitted
values, horizontal reference lines are drawn at 0 and +/-2.5. The
id.n observations with the largest absolute values of the
standardized residuals are identified by a label (the observation
number). The default for id.n is the number of regression outliers,
i.e., the number of observations whose absolute residuals are too large (cf.
weights).
For the regression diagnostic plot, the robust Mahalanobis distances of the
predictor variables are computed via the MCD based on only those predictors
with non-zero coefficients (see
covMcd). Horizontal reference lines are drawn at
+/-2.5 and a vertical reference line is drawn at the upper 97.5% quantile
of the chi-squared distribution with p degrees of
freedom, where p denotes the number of predictors with non-zero
coefficients. The id.n observations with the largest absolute values
of the standardized residuals and/or largest robust Mahalanobis distances
are identified by a label (the observation number). The default for
id.n is the number of all outliers: regression outliers (i.e.,
observations whose absolute residuals are too large, cf.
weights) and leverage points (i.e.,
observations with robust Mahalanobis distance larger than the 97.5%
quantile of the chi-squared distribution with p
degrees of freedom).
If only one plot is requested, an object of class "ggplot" (see
ggplot), otherwise a list of such objects.
Andreas Alfons
ggplot, rlars,
grplars, rgrplars, tslarsP,
rtslarsP, tslars, rtslars,
sparseLTS, plot.lts
## generate data
# example is not high-dimensional to keep computation time low
library("mvtnorm")
set.seed(1234) # for reproducibility
n <- 100 # number of observations
p <- 25 # number of variables
beta <- rep.int(c(1, 0), c(5, p-5)) # coefficients
sigma <- 0.5 # controls signal-to-noise ratio
epsilon <- 0.1 # contamination level
Sigma <- 0.5^t(sapply(1:p, function(i, j) abs(i-j), 1:p))
x <- rmvnorm(n, sigma=Sigma) # predictor matrix
e <- rnorm(n) # error terms
i <- 1:ceiling(epsilon*n) # observations to be contaminated
e[i] <- e[i] + 5 # vertical outliers
y <- c(x %*% beta + sigma * e) # response
x[i,] <- x[i,] + 5 # bad leverage points
## robust LARS
# fit model
fitRlars <- rlars(x, y, sMax = 10)
# create plot
diagnosticPlot(fitRlars)
## sparse LTS
# fit model
fitSparseLTS <- sparseLTS(x, y, lambda = 0.05, mode = "fraction")
# create plot
diagnosticPlot(fitSparseLTS)
diagnosticPlot(fitSparseLTS, fit = "both")