| gaitpoisson.mix {VGAM} | R Documentation |
Fits a generally–altered, –inflated and –truncated Poisson regression (mixtures of Poissons on nested and/or partitioned supports). The truncation may include values in the upper tail.
gaitpoisson.mix(alter = NULL, inflate = NULL, truncate = NULL,
max.support = Inf, zero = c("pobs.a", "pstr.i"),
eq.ap = FALSE, eq.ip = FALSE, llambda.p = "loglink",
lpobs.a = "logitlink", llambda.a = "loglink",
lpstr.i = "logitlink", llambda.i = "loglink",
type.fitted = c("mean", "pobs.a", "pstr.i", "Pobs.a", "Pstr.i",
"prob.a", "prob.i", "prob.t", "lhs.prob"),
imethod = 1, ilambda.p = NULL, ilambda.a = NULL, ilambda.i = NULL,
ipobs.a = NULL, ipstr.i = NULL, ishrinkage = 0.95, probs.y = 0.35)
alter, inflate, truncate |
Vector of altered, inflated and truncated values,
i.e., nonnegative integers.
A Due to its flexibility, it is easy to misuse this function
and ideally the values of these arguments should be well
justified by the application on hand.
Adding unnecessary values to these arguments willy-nilly
is a recipe for disaster, especially for |
llambda.p, llambda.a, llambda.i |
Link functions;
the suffixes |
lpobs.a, lpstr.i |
Link functions;
See |
eq.ap, eq.ip |
Single logical each.
Constrain the rate parameters to be equal?
See |
type.fitted, max.support |
See The choice |
imethod, ipobs.a, ipstr.i |
See |
ilambda.p, ilambda.a, ilambda.i |
See |
probs.y, ishrinkage |
See |
zero |
See |
Although the full GAIT–Pois–Pois–Pois model may be fitted,
the two submodels that may be fitted can be abbreviated
GAT–Pois–Pois or
GIT–Pois–Pois,
which is where the inner distribution for
ordinary values is the Poisson distribution, and
the outer distribution for the altered or inflated values
is another Poisson distribution with a different rate parameter
by default.
Thus for the GAT model
the distribution being fitted is a (spliced) mixture
of two Poissons with differing (partitioned) support.
Likewise, for the GIT model
the distribution being fitted is a mixture
of two Poissons with nested support.
The two rate parameters may be constrained to be equal using
eq.ap or eq.ip.
For the GIT model,
by default, a logistic regression models the (structural)
probability pstr.i that the response is inflated.
This function currently does not handle multiple responses.
Further details are at Gaitpois.
An alternative variant of this distribution,
more unstructured in nature, is based
on the multinomial logit model—see gaitpoisson.mlm.
For the GIT model,
the ordering of the linear/additive predictors corresponds to
length(inflate) equalling 0, 1, and more than 1;
the dimension grows accordingly.
The same idea holds for the GAT model.
Apart from the order of the linear/additive predictors,
the following are (or should be) equivalent:
gaitpoisson.mix() and poissonff(),
gaitpoisson.mix(alter = 0) and zapoisson(zero = "pobs0"),
gaitpoisson.mix(inflate = 0) and zipoisson(zero = "pstr0"),
gaitpoisson.mix(truncate = 0) and pospoisson().
An object of class "vglmff" (see vglmff-class).
The object is used by modelling functions such as vglm,
and vgam.
The fitted.values slot of the fitted object,
which should be extracted by the generic function fitted,
are similar to gaitpoisson.mlm.
Amateurs have the tendency to be overzealous fitting
zero-inflated models when the fitted mean is low—the
warning of ziP should be heeded
and it applies here to all inflated values.
Fitting a GIT model requires more caution than
for the GAT hurdle model because
ideally gross inflation is needed in the data for it to work properly.
Deflation or no inflation will produce numerical problems
such as extreme coefficient values,
hence set trace = TRUE to monitor convergence.
It is often a good idea to set eq.ip = TRUE,
especially when length(inflate) is low or the values
of inflate are not spread over the range of the response.
That is, if the inflate values form a single small
cluster then this can easily create estimation difficulties—the
idea is somewhat similar to multicollinearity.
The defaults for this family function may change in
the future as more experience is obtained using it.
If length(inflate) is very low then it is probably a good
idea to set eq.ip = TRUE
so that
the estimation can borrow strength from both the inflated and
non-inflated values.
Numerical problems can easily arise because of the
flexibility of this distribution and/or the lack of
sizeable inflation; it is a good idea to
gain experience with simulated data first before applying
it to real data.
See gaitpoisson.mlm for other general details.
T. W. Yee
Yee, T. W. and Ma, C. (2020). Generally–altered, –inflated and –truncated regression, with application to heaped and seeped count data. In preparation.
Gaitpois,
gaitpoisson.mlm,
gatnbinomial.mix,
zipoisson,
pospoisson,
gaitlog.mix,
CommonVGAMffArguments,
rootogram4,
simulate.vlm.
avec <- c(5, 10) # Alter these values
ivec <- c(3, 15) # Inflate these values
tvec <- c(6, 7) # Truncate these values
pobs.a <- logitlink(-1, inverse = TRUE) # About 0.27
pstr.i <- logitlink(-1, inverse = TRUE) # About 0.27
max.support <- 20; set.seed(1)
gdata <- data.frame(x2 = runif(nn <- 1000))
gdata <- transform(gdata, lambda.p = exp(2 + 0.5 * x2))
gdata <- transform(gdata,
y1 = rgaitpois(nn, lambda.p, alter.mix = avec, pobs.mix.a = pobs.a,
inflate.mix = ivec, pstr.mix.i = pstr.i,
truncate = tvec, max.support = max.support))
gaitpoisson.mix(alter = avec, inflate = ivec)
with(gdata, table(y1))
gaitpxfit <- vglm(y1 ~ x2, crit = "coef", trace = TRUE, data = gdata,
gaitpoisson.mix(alter = avec, inflate = ivec,
truncate = tvec, eq.ap = TRUE,
eq.ip = TRUE, max.support = max.support))
head(fitted(gaitpxfit, type.fitted = "Pstr.i"))
head(predict(gaitpxfit))
coef(gaitpxfit, matrix = TRUE)
summary(gaitpxfit)