| ergm-constraints {ergm} | R Documentation |
ergm is used to fit exponential-family random graph
models (ERGMs), in which
the probability of a given network, y, on a set of nodes is
h(y) \exp\{η(θ) \cdot
g(y)\}/c(θ), where
h(y) is the reference measure (usually h(y)=1),
g(y) is a vector of network statistics for y,
η(θ) is a natural parameter vector of the same
length (with η(θ)=θ for most terms), and c(θ) is the
normalizing constant for the distribution.
This page describes the constraints (the networks y for which h(y)>0)
that are included with the
ergm package. Other packages may add new
constraints.
A constraints formula is a one- or two-sided formula whose left-hand
side is an optional direct selection of the InitErgmProposal function
and whose right-hand
side is a series of one or more terms separated by “+”
and “-” operators, specifying the constraint.
The sample space (over and above the reference distribution) is determined by iterating over the constraints terms from left to right, each term updating it as follows:
If the constraint introduces complex dependence structure
(e.g., constrains degree or number of edges in the network), then
this constraint always restricts the sample space. It may only have
a “+” sign.
If the constraint only restricts the set of dyads that may
vary in the sample space (e.g., block-diagonal structure or fixing
specific dyads at specific values) and has a “+” sign,
the set of dyads that may vary is restricted to those that may vary
according to this constraint and all the constraints to date.
If the constraint only restricts the set of dyads that may
vary in the sample space but has a “-” sign,
the set of dyads that may vary is expanded to those that may vary
according to this constraint or all the constraints up to date.
For example, a constraints formula ~a-b+c-d with all
constraints dyadic will allow dyads permitted by either 'a' or 'b' but
only if they are also permitted by 'c'; as well as all dyads permitted
by 'd'. If 'A', 'B', 'C', and 'D' were logical matrices, the matrix of
variable dyads would be equal to '((A|B)&C)|D'.
Terms with a positive sign can be viewed as “adding” a constraint while those with a negative sign can be viewed as “relaxing” a constraint.
%ergmlhs%The dot (.) on a constraints formula has a special meaning role. Most of the time, it's a placeholder for no constraints, as is NULL: all networks of a particular size and type have non-zero probability.
However, if the network on the LHS of the ergm formula has a %ergmlhs% "constraints" and/or %ergmlhs% "obs.constraints" attribute, they will be substituted in place of the dot. To avoid this substitution, i.e., ignore the %ergmlhs% setting, either pass NULL for no constraints or the overriding constraints formula without the dot.
ergm packageDyads(fix=NULL, vary=NULL) (dyad-independent)This is an “operator” constraint that takes one or two ergm formulas. These formulas should contaion only dyad-independent terms. For the terms in the fix= formula, dyads that affect the network statistic (i.e., have nonzero change statistic) for any the terms will be fixed at their current values. For the terms in the vary= formula, only those that change at least one of the terms will be allowed to vary, and all others will be fixed. If both formulas are given, the dyads that vary either for one or for the other will be allowed to vary. Note that a formula passed to Dyads without an argument name will default to fix=.
bd(attribs,maxout,maxin,minout,minin)Constrain maximum and minimum vertex degree. See “Placing Bounds on Degrees” section for more information.
degrees and nodedegreesPreserve the degree of each vertex of the given network: only networks whose vertex degrees are the same as those in the network passed in the model formula have non-zero probability. If the network is directed, both indegree and outdegree are preserved.
odegrees, idegrees, b1degrees, b2degreesFor directed networks, odegrees preserves the outdegree of each vertex of the given
network, while allowing indegree to vary, and conversely for
idegrees. b1degrees and b2degrees perform a
similar function for bipartite networks.
degreedistPreserve the degree distribution of the given network: only networks whose degree distributions are the same as those in the network passed in the model formula have non-zero probability.
dyadnoise(p01,p10)A soft constraint to adjust the sampled distribution for
dyad-level noise with known perturbation probabilities. It is
assumed that the observed LHS network is a noisy observation of
some unobserved true network, with p01 giving the dyadwise
probability of erroneously observing a tie where the true network
had a non-tie and p10 giving the dyadwise probability of
erroneously observing a nontie where the true network had a tie.
p01 and p10 can be either both be scalars or or both
be adjacency matrices of the same dimension as that of the LHS
network giving these probabilities.
See Karwa et al. (2016) for an application.
idegreedist and odegreedistPreserve the (respectively) indegree or outdegree distribution of the given network.
edgesPreserve the edge count of the given network: only networks having the same number of edges as the network passed in the model formula have non-zero probability.
observed (dyad-independent)Preserve the observed dyads of the given network.
fixedas(present,absent) (dyad-independent)Preserve the edges in 'present' and preclude the edges in 'absent'. Both 'present' and 'absent' can take input object as edgelist and network, the latter will convert to the corresponding edgelist.
fixallbut(free.dyads) (dyad-independent)Preserve the dyad status in all but free.dyads. free.dyads can take input object as edgelist and network, the latter will convert to the corresponding edgelist.
egocentric(attr = NULL, direction = c("both","out","in")) (dyad-independent)Preserve values of dyads incident on vertices with attribute attr (see Specifying Vertex Attributes and Levels for details) being TRUE or if attrname is NULL, the vertex attribute "na" being 'FALSE'. For directed networks, direction=="out" only preserves the out-dyads of those actors, and direction=="in" preserves their in-dyads.
blocks(attr = NULL, levels = NULL, levels2 = FALSE, b1levels = NULL, b2levels = NULL) (dyad-independent)Constrain "blocks" of dyads; any dyad whose toggle would produce a nonzero change statistic for a nodemix term with the same arguments will be fixed. Note that the levels2 argument has a different default value for blocks than it does for nodemix.
blockdiag(attr) (dyad-independent)Force a block-diagonal structure (and its bipartite analogue) on
the network. Only dyads (i,j) for which
attr(i)==attr(j) can have edges. See Specifying Vertex attributes and Levels (? nodal_attributes) for the ways to specify nodal attributes and expressions.
Note that the current implementation requires that blocks be contiguous for “unipartite” graphs, and for bipartite graphs, they must be contiguous within a partition and must have the same ordering in both partitions. (They do not, however, require that all blocks be represented in both partitions, but those that overlap must have the same order.)
If multiple block-diagonal constraints are given, or if
attr is a vector with multiple attribute names, blocks
will be constructed on all attributes matching.
Not all combinations of the above are supported.
There are many times when one may wish to condition on the
number of inedges or outedges possessed by a node, either as a
consequence of some intrinsic property of that node (e.g., to control for
activity or popularity processes), to account
for known outliers of some kind, and thus we wish to limit its indegree, an
intrinsic property of the sampling scheme whence came our data (e.g.,
the survey asked everyone to name only three friends total) or as a
function of the attributes of the nodes to which a node has edges
(e.g., we specify that nodes designated “male” have a maximum number
of outdegrees to nodes designated “female”). To accomplish this we
use the constraints term bd.
Let's consider the simple cases first. Suppose you want to condition on the total number of degrees regardless of attributes. That is, if you had a survey that asked respondents to name three alters and no more, then you might want to limit your maximal outdegree to three without regard to any of the alters' attributes. The argument is then:
constraints=~bd(maxout=3)
Similar calls are used to restrict the number of indegrees
(maxin), the minimum number of outdegrees
(minout), and the minimum number of indegrees
(minin).
You can also set ego specific limits. For example:
constraints=bd(maxout=rep(c(3,4),c(36,35)))
limits the first 36 to 3 and the other 35 to 4 outdegrees.
Multiple restrictions can be combined. bd is very flexible.
In general, the bd term can contain up to five arguments:
bd(attribs=attribs,
maxout=maxout,
maxin=maxin,
minout=minout,
minin=minin)
Omitted arguments are unrestricted, and arguments of length 1
are replicated out to all nodes (as above). If an individual
entry in maxout,..., minin is NA then
no restriction of that kind is applied to that actor.
In general, attribs is a matrix of the attributes on
which we are conditioning. The dimensions of attribs
are n_nodes rows by attrcount columns, where
attrcount is the number of distinct attribute values
on which we want to condition (i.e., a separate column is
required for “male” and “female” if we want to condition on
the number of ties to both “male” and “female” partners).
The value of attribs[n, i], therefore, is TRUE
if node n has attribute value i, and FALSE otherwise.
(Note that, since each column represents only a single value
of a single attribute, the values of this matrix are all
Boolean (TRUE or FALSE).) It is important to
note that attribs is a matrix of nodal attributes,
not alter attributes.
So, for instance, if we wanted to construct an attribs matrix
with two columns, one each for male and female attribute
values (we are conditioning on these values of the attribute
“sex”), and the attribute sex is represented in ads.sex as
an n_node-long vector of 0s and 1s (men and women),
then our code would look as follows:
# male column: bit vector, TRUE for males attrsex1 <- (ads.sex == 0) # female column: bit vector, TRUE for females attrsex2 <- (ads.sex == 1) # now create attribs matrix attribs <- matrix(ncol=2,nrow=71, data=c(attrsex1,attrsex2))
maxout is a matrix of alter attributes, with the same
dimensions as the attribs matrix. maxout is n_nodes
rows by attrcount columns. The value of maxout[n,i],
therefore, is the maximum number of outdegrees permitted
from node n to nodes with the attribute i (where a NA
means there is no maximum).
For example: if we wanted to create a maxout matrix to work
with our attribs matrix above, with a maximum from every
node of five outedges to males and five outedges to females,
our code would look like this:
# every node has maximum of 5 outdegrees to male alters
maxoutsex1 <- c(rep(5,71))
# every node has maximum of 5 outdegrees to female alters
maxoutsex2 <- c(rep(5,71))
# now create maxout matrix
maxout <- cbind(maxoutsex1,maxoutsex2)
The maxin, minout, and minin matrices
are constructed exactly like the maxout matrix,
except for the maximum allowed indegree, the minimum allowed
outdegree, and the minimum allowed indegree, respectively.
Note that in an undirected network, we only look at the outdegree
matrices; maxin and minin will both be ignored
in this case.
Goodreau SM, Handcock MS, Hunter DR, Butts CT, Morris M (2008a). A statnet Tutorial. Journal of Statistical Software, 24(8). https://www.jstatsoft.org/v24/i08/.
Hunter, D. R. and Handcock, M. S. (2006) Inference in curved exponential family models for networks, Journal of Computational and Graphical Statistics.
Hunter DR, Handcock MS, Butts CT, Goodreau SM, Morris M (2008b). ergm: A Package to Fit, Simulate and Diagnose Exponential-Family Models for Networks. Journal of Statistical Software, 24(3). https://www.jstatsoft.org/v24/i03/.
Karwa V, Krivitsky PN, and Slavkovi\'c AB (2016). Sharing Social Network Data: Differentially Private Estimation of Exponential-Family Random Graph Models. Journal of the Royal Statistical Society, Series C, 66(3): 481-500. doi: 10.1111/rssc.12185
Krivitsky PN (2012). Exponential-Family Random Graph Models for Valued Networks. Electronic Journal of Statistics, 6, 1100-1128. doi: 10.1214/12-EJS696
Morris M, Handcock MS, Hunter DR (2008). Specification of Exponential-Family Random Graph Models: Terms and Computational Aspects. Journal of Statistical Software, 24(4). https://www.jstatsoft.org/v24/i04/.