| instrumental_forest {grf} | R Documentation |
Trains an instrumental forest that can be used to estimate conditional local average treatment effects tau(X) identified using instruments. Formally, the forest estimates tau(X) = Cov[Y, Z | X = x] / Cov[W, Z | X = x]. Note that when the instrument Z and treatment assignment W coincide, an instrumental forest is equivalent to a causal forest.
instrumental_forest(X, Y, W, Z, Y.hat = NULL, W.hat = NULL, Z.hat = NULL, sample.fraction = 0.5, mtry = NULL, num.trees = 2000, num.threads = NULL, min.node.size = NULL, honesty = TRUE, honesty.fraction = NULL, ci.group.size = 2, reduced.form.weight = 0, alpha = 0.05, imbalance.penalty = 0, stabilize.splits = TRUE, compute.oob.predictions = TRUE, seed = NULL, clusters = NULL, samples_per_cluster = NULL)
X |
The covariates used in the instrumental regression. |
Y |
The outcome. |
W |
The treatment assignment (may be binary or real). |
Z |
The instrument (may be binary or real). |
Y.hat |
Estimates of the expected responses E[Y | Xi], marginalizing over treatment. If Y.hat = NULL, these are estimated using a separate regression forest. |
W.hat |
Estimates of the treatment propensities E[W | Xi]. If W.hat = NULL, these are estimated using a separate regression forest. |
Z.hat |
Estimates of the instrument propensities E[Z | Xi]. If Z.hat = NULL, these are estimated using a separate regression forest. |
sample.fraction |
Fraction of the data used to build each tree. Note: If honesty = TRUE, these subsamples will further be cut by a factor of honesty.fraction. |
mtry |
Number of variables tried for each split. |
num.trees |
Number of trees grown in the forest. Note: Getting accurate confidence intervals generally requires more trees than getting accurate predictions. |
num.threads |
Number of threads used in training. If set to NULL, the software automatically selects an appropriate amount. |
min.node.size |
A target for the minimum number of observations in each tree leaf. Note that nodes with size smaller than min.node.size can occur, as in the original randomForest package. |
honesty |
Whether to use honest splitting (i.e., sub-sample splitting). |
honesty.fraction |
The fraction of data that will be used for determining splits if honesty = TRUE. Corresponds to set J1 in the notation of the paper. When using the defaults (honesty = TRUE and honesty.fraction = NULL), half of the data will be used for determining splits |
ci.group.size |
The forst will grow ci.group.size trees on each subsample. In order to provide confidence intervals, ci.group.size must be at least 2. |
reduced.form.weight |
Whether splits should be regularized towards a naive splitting criterion that ignores the instrument (and instead emulates a causal forest). |
alpha |
A tuning parameter that controls the maximum imbalance of a split. |
imbalance.penalty |
A tuning parameter that controls how harshly imbalanced splits are penalized. |
stabilize.splits |
Whether or not the instrument should be taken into account when determining the imbalance of a split (experimental). |
compute.oob.predictions |
Whether OOB predictions on training set should be precomputed. |
seed |
The seed for the C++ random number generator. |
clusters |
Vector of integers or factors specifying which cluster each observation corresponds to. |
samples_per_cluster |
If sampling by cluster, the number of observations to be sampled from each cluster when training a tree. If NULL, we set samples_per_cluster to the size of the smallest cluster. If some clusters are smaller than samples_per_cluster, the whole cluster is used every time the cluster is drawn. Note that clusters with less than samples_per_cluster observations get relatively smaller weight than others in training the forest, i.e., the contribution of a given cluster to the final forest scales with the minimum of the number of observations in the cluster and samples_per_cluster. |
A trained instrumental forest object.