| tree_distance {castor} | R Documentation |
Given two rooted phylogenetic trees with identical tips, calculate their difference using a distance metric.
tree_distance(treeA, treeB, tipsA2B=NULL, metric="RF", normalized=FALSE)
treeA |
A rooted tree of class "phylo". The root is assumed to be the unique node with no incoming edge. |
treeB |
A rooted tree of class "phylo", with the same number of tips as |
tipsA2B |
Optional integer vector of size Ntips, mapping |
metric |
Character, specifying the distance metric to be used. Currently only the Robinson-Foulds ("RF") metric is implemented, which is the number of clusters (sets of tips descending from a node) in either of the trees but not shared by both trees (Robinson and Foulds, 1981; Day, 1985). The Robinson-Foulds metric does not take into account branch lengths. |
normalized |
Logical, specifying whether the calculated distance should be normalized to be between 0 and 1. For the Robinson-Foulds metric, the distance will be normalized by dividing it by the total number of nodes in the two trees. |
If the trees differ in theis tips, they must be pruned down to their common set of tips. If tips have different labels in the two trees, but are nevertheless equivalent, the mapping between the two trees must be provided using tipsA2B. The trees may include multi-furcations as well as mono-furcations (i.e. nodes with only one child).
Note that under some Robinson-Foulds variants the trees can be unrooted; in this present implementation trees must be rooted and the placement of the root influences the distance, following the definition by Day (1985).
The asymptotic average time complexity of this function is O(Nedges*log(Nedges)*log(log(Nedges))) for a balanced bifurcating tree.
A single non-negative number, representing the distance between the two trees.
Stilianos Louca
Robinson, D. R., Foulds, L. R. (1981). Comparison of phylogenetic trees. Mathematical Biosciences. 53: 131-147.
Day, W. H. E. (1985). Optimal algorithms for comparing trees with labeled leaves. Journal of Classification. 2:7-28.
# generate a random tree
Ntips = 1000
treeA = generate_random_tree(list(birth_rate_intercept=1),
max_tips=Ntips)$tree
# create a second tree with slightly different topology
treeB = treeA
shuffled_tips = sample.int(Ntips, size=Ntips/10, replace=FALSE)
treeB$tip.label[shuffled_tips] = treeB$tip.label[sample(shuffled_tips)]
# calculate Robinson-Foulds distance between trees
distance = tree_distance(treeA, treeB, metric="RF")