| impute {bnlearn} | R Documentation |
Impute missing values in a data set or predict a variable from a Bayesian network.
## S3 method for class 'bn.fit' predict(object, node, data, method = "parents", ..., prob = FALSE, debug = FALSE) impute(object, data, method, ..., debug = FALSE)
object |
an object of class |
data |
a data frame containing the data to be imputed. Complete observations will be ignored. |
node |
a character string, the label of a node. |
method |
a character string, the method used to impute the missing
values or predict new ones. The default value is |
... |
additional arguments for the imputation method. See below. |
prob |
a boolean value. If |
debug |
a boolean value. If |
predict() returns the predicted values for node given the data
specified by data and the fitted network. Depending on the value of
method, the predicted values are computed as follows.
parents: the predicted values are computed by plugging in
the new values for the parents of node in the local probability
distribution of node extracted from fitted.
bayes-lw: the predicted values are computed by averaging
likelihood weighting simulations performed using all the available nodes
as evidence (obviously, with the exception of the node whose values we
are predicting). The number of random samples which are averaged for each
new observation is controlled by the n optional argument; the
default is 500. If the variable being predicted is discrete, the
predicted level is that with the highest conditional probability. If the
variable is continuous, the predicted value is the expected value of the
conditional distribution. The variables that are used to compute the
predicted values can be specified with the from optional argument;
the default is to use all the relevant variables from the data.
impute() is based on predict(), and can impute missing values
with the same methods (parents and bayes-lw). The
latter can take an additional argument n with the number of random
samples which are averaged for each observation.
predict() returns a numeric vector (for Gaussian and conditional
Gaussian nodes), a factor (for categorical nodes) or an ordered factor (for
ordinal nodes). If prob = TRUE and the network is discrete, the
probabilities used for prediction are attached to the predicted values as
an attribute called prob.
impute() returns a data frame with the same structure as data.
Ties in prediction are broken using Bayesian tie breaking, i.e. sampling at random from the tied values. Therefore, setting the random seed is required to get reproducible results.
predict() accepts either a bn or a bn.fit object as its
first argument. For the former, the parameters of the network are fitted on
data, that is, the observations whose class labels the function is
trying to predict.
Marco Scutari
# missing data imputation.
with.missing.data = gaussian.test
with.missing.data[sample(nrow(with.missing.data), 500), "F"] = NA
fitted = bn.fit(model2network("[A][B][E][G][C|A:B][D|B][F|A:D:E:G]"),
gaussian.test)
imputed = impute(fitted, with.missing.data)
# predicting a variable in the test set.
training = bn.fit(model2network("[A][B][E][G][C|A:B][D|B][F|A:D:E:G]"),
gaussian.test[1:2000, ])
test = gaussian.test[2001:nrow(gaussian.test), ]
predicted = predict(training, node = "F", data = test)
# obtain the conditional probabilities for the values of a single variable
# given a subset of the rest, they are computed to determine the predicted
# values.
fitted = bn.fit(model2network("[A][C][F][B|A][D|A:C][E|B:F]"), learning.test)
evidence = data.frame(A = factor("a", levels = levels(learning.test$A)),
F = factor("b", levels = levels(learning.test$F)))
predicted = predict(fitted, "C", evidence,
method = "bayes-lw", prob = TRUE)
attr(predicted, "prob")