Setup
1. Introduction
Phylogenetic trees was often used with associated data in various biological studies. ggtree, a flexible R package to visualize phylogenetic tree, had been developed by GuangChuang Yu (Yu et al. 2017). It provided geom_facet
function to align associated graphs to the tree (Yu et al. 2018; Yu 2020). However, This function did not support the tree created using circular
, fan
or radial
layout. To solve the problem, We developed ggtreeExtra
, which can align associated graphs to circular
, fan
or radial
and other rectangular
layout tree. ggtreeExtra
provides function, geom_fruit
to align graphs to the tree. But the associated graphs will align in different position. So we also developed geom_fruit_list
to add multiple layers in the same position. Furthermore, axis
of external layers can be added using the axis.params=list(axis="x",...)
in geom_fruit
. The grid lines
of external layers can be added using the grid.params=list()
in geom_fruit
. These functions are based on ggplot2 using grammar of graphics (Wickham 2016). More vignettes can be found on the chapter10
of online book.
2. Install
You can use the following to install it
# for devel
if(!requireNamespace("remotes", quietly=TRUE)){
install.packages("remotes")
}
remotes::install_github("YuLab-SMU/ggtreeExtra")
# for release
if (!requireNamespace("BiocManager", quietly=TRUE))
install.packages("BiocManager")
## BiocManager::install("BiocUpgrade") ## you may need this
BiocManager::install("ggtreeExtra")
3. Usage
To show the package profiling, I will use a tree file downloaded from plotTree. The associated datasets were simulated.
3.1 add single layer
# The path of tree file.
trfile <- system.file("extdata", "tree.nwk", package="ggtreeExtra")
# The path of file to plot tippoint.
tippoint1 <- system.file("extdata", "tree_tippoint_bar.csv", package="ggtreeExtra")
# The path of first layer outside of tree.
ring1 <- system.file("extdata", "first_ring_discrete.csv", package="ggtreeExtra")
# The path of second layer outside of tree.
ring2 <- system.file("extdata", "second_ring_continuous.csv", package="ggtreeExtra")
# The tree file was import using read.tree. If you have other format of tree, you can use corresponding function of treeio to read it.
tree <- read.tree(trfile)
# This dataset will to be plotted point and bar.
dat1 <- read.csv(tippoint1)
knitr::kable(head(dat1))
ID | Location | Length | Group | Abundance |
---|---|---|---|---|
DE0655_HCMC_2001 | HK | 0.1786629 | Yes | 12.331055 |
MS0111_HCMC_1996 | HK | 0.2105236 | Yes | 9.652052 |
MS0063_HCMC_1995 | HK | 1.4337983 | Yes | 11.584822 |
DE0490_HCMC_2000 | HK | 0.3823731 | Yes | 7.893231 |
DE0885_HCMC_2001 | HK | 0.8478901 | Yes | 12.117232 |
DE0891_HCMC_2001 | HK | 1.5038646 | Yes | 10.819734 |
ID | Pos | Type |
---|---|---|
DE0846_HCMC_2001 | 8 | type2 |
MS0034_HCMC_1995 | 8 | type2 |
EG1017_HCMC_2009 | 6 | type2 |
KH18_HCMC_2009 | 5 | type2 |
10365_HCMC_2010 | 7 | type2 |
EG1021_HCMC_2009 | 1 | type1 |
ID | Type2 | Alpha |
---|---|---|
MS0004_HCMC_1995 | p3 | 0.2256195 |
DE1150_HCMC_2002 | p2 | 0.2222086 |
MS0048_HCMC_1995 | p2 | 0.1881510 |
HUE57_HCMC_2010 | p3 | 0.1939088 |
DE1486_HCMC_2002 | p2 | 0.2018493 |
DE1165_HCMC_2002 | p3 | 0.1812997 |
# The format of the datasets is the long shape for ggplot2. If you have short shape dataset,
# you can use melt of reshape2 or pivot_longer of tidyr to convert it.
# We use ggtree to create fan layout tree.
p <- ggtree(tree, layout="fan", open.angle=10, size=0.5)
#> Scale for 'y' is already present. Adding another scale for 'y', which will
#> replace the existing scale.
p
## Next, we can use %<+% of ggtree to add annotation dataset to tree.
#p1 <- p %<+% dat1
#p1
## We use geom_star to add point layer outside of tree.
#p2 <- p1 +
# geom_star(
# mapping=aes(fill=Location, size=Length, starshape=Group),
# starstroke=0.2
# ) +
# scale_size_continuous(
# range=c(1, 3),
# guide=guide_legend(
# keywidth=0.5,
# keyheight=0.5,
# override.aes=list(starshape=15),
# order=2)
# ) +
# scale_fill_manual(
# values=c("#F8766D", "#C49A00", "#53B400", "#00C094", "#00B6EB", "#A58AFF", "#FB61D7"),
# guide="none" # don't show the legend.
# ) +
# scale_starshape_manual(
# values=c(1, 15),
# guide=guide_legend(
# keywidth=0.5, # adjust width of legend
# keyheight=0.5, # adjust height of legend
# order=1 # adjust the order of legend for multiple legends.
# ),
# na.translate=FALSE # to remove the NA legend.
# )
#p2
# Or if we don't use %<+% to add annotation dataset, instead of data parameter of geom_fruit.
# You should specify the y column (tip label), here is y=ID.
p2 <- p +
geom_fruit(
data=dat1,
geom=geom_star,
mapping=aes(y=ID, fill=Location, size=Length, starshape=Group),
position="identity",
starstroke=0.2
) +
scale_size_continuous(
range=c(1, 3), # the range of size.
guide=guide_legend(
keywidth=0.5,
keyheight=0.5,
override.aes=list(starshape=15),
order=2
)
) +
scale_fill_manual(
values=c("#F8766D", "#C49A00", "#53B400", "#00C094", "#00B6EB", "#A58AFF", "#FB61D7"),
guide="none"
) +
scale_starshape_manual(
values=c(1, 15),
guide=guide_legend(
keywidth=0.5,
keyheight=0.5,
order=1
)
)
p2
# Next, I will add a heatmap layer on the p2 using `geom_tile` of ggplot2.
# Since I want to use fill to map some variable of dataset and the fill of p2 had been mapped.
# So I need use `new_scale_fill` to initialize it.
p3 <- p2 +
new_scale_fill() +
geom_fruit(
data=dat2,
geom=geom_tile,
mapping=aes(y=ID, x=Pos, fill=Type),
offset=0.08, # The distance between layers, default is 0.03 of x range of tree.
pwidth=0.25 # width of the layer, default is 0.2 of x range of tree.
) +
scale_fill_manual(
values=c("#339933", "#dfac03"),
guide=guide_legend(keywidth=0.5, keyheight=0.5, order=3)
)
p3
# We can also add heatmap layer for continuous values.
p4 <- p3 +
new_scale_fill() +
geom_fruit(
data=dat3,
geom=geom_tile,
mapping=aes(y=ID, x=Type2, alpha=Alpha, fill=Type2),
pwidth=0.15,
axis.params=list(
axis="x", # add axis text of the layer.
text.angle=-45, # the text size of axis.
hjust=0 # adust the horizontal position of text of axis.
)
) +
scale_fill_manual(
values=c("#b22222", "#005500", "#0000be", "#9f1f9f"),
guide=guide_legend(keywidth=0.5, keyheight=0.5, order=4)
) +
scale_alpha_continuous(
range=c(0, 0.4), # the range of alpha
guide=guide_legend(keywidth=0.5, keyheight=0.5, order=5)
)
# Then we add a bar layer outside of the tree.
p5 <- p4 +
new_scale_fill() +
geom_fruit(
data=dat1, # The abundance of dat1 will be mapped to x,
geom=geom_bar,
mapping=aes(y=ID, x=Abundance, fill=Location),
pwidth=0.4,
stat="identity",
orientation="y", # the orientation of axis.
axis.params=list(
axis="x", # add axis text of the layer.
text.angle=-45, # the text size of axis.
hjust=0 # adust the horizontal position of text of axis.
),
grid.params=list() # add the grid line of the external bar plot.
) +
scale_fill_manual(
values=c("#F8766D", "#C49A00", "#53B400", "#00C094", "#00B6EB", "#A58AFF", "#FB61D7"),
guide=guide_legend(keywidth=0.5, keyheight=0.5, order=6)
) +
theme(#legend.position=c(0.96, 0.5), # the position of legend.
legend.background=element_rect(fill=NA), # the background of legend.
legend.title=element_text(size=7), # the title size of legend.
legend.text=element_text(size=6), # the text size of legend.
legend.spacing.y = unit(0.02, "cm") # the distance of legends (y orientation).
)
p5
3.2 add multiple layers on the same position.
In the section, I will randomly create a tree and associated datasets.
# To reproduce.
set.seed(1024)
# generate 100 tip point tree.
tr <- rtree(100)
# I generate three datasets, which are the same except the third column name.
dt <- data.frame(id=tr$tip.label, value=abs(rnorm(100)), group=c(rep("A",50),rep("B",50)))
df <- dt
dtf <- dt
colnames(df)[[3]] <- "group2"
colnames(dtf)[[3]] <- "group3"
# plot tree
p <- ggtree(tr, layout="fan", open.angle=0)
#> Scale for 'y' is already present. Adding another scale for 'y', which will
#> replace the existing scale.
p
# the first ring.
p1 <- p +
geom_fruit(
data=dt,
geom=geom_bar,
mapping=aes(y=id, x=value, fill=group),
orientation="y",
stat="identity"
) +
new_scale_fill()
p1
# the second ring
# geom_fruit_list is a list, which first element must be layer of geom_fruit.
p2 <- p1 +
geom_fruit_list(
geom_fruit(
data = df,
geom = geom_bar,
mapping = aes(y=id, x=value, fill=group2),
orientation = "y",
stat = "identity"
),
scale_fill_manual(values=c("blue", "red")), # To map group2
new_scale_fill(), # To initialize fill scale.
geom_fruit(
data = dt,
geom = geom_star,
mapping = aes(y=id, x=value, fill=group),
size = 2.5,
color = NA,
starstroke = 0
)
) +
new_scale_fill()
p2
# The third ring
p3 <- p2 +
geom_fruit(
data=dtf,
geom=geom_bar,
mapping = aes(y=id, x=value, fill=group3),
orientation = "y",
stat = "identity"
) +
scale_fill_manual(values=c("#00AED7", "#009E73"))
p3
4. Need helps?
If you have questions/issues, please visit github issue tracker. You also can post to google group. Users are highly recommended to subscribe to the mailing list.
5. Session information
Here is the output of sessionInfo() on the system on which this document was compiled:
#> R version 4.0.4 (2021-02-15)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 18.04.5 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.12-bioc/R/lib/libRblas.so
#> LAPACK: /home/biocbuild/bbs-3.12-bioc/R/lib/libRlapack.so
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] ggnewscale_0.4.5 treeio_1.14.3 ggtree_2.4.1 ggplot2_3.3.3
#> [5] ggstar_1.0.1 ggtreeExtra_1.0.2
#>
#> loaded via a namespace (and not attached):
#> [1] prettydoc_0.4.1 tidyselect_1.1.0 xfun_0.22
#> [4] bslib_0.2.4 purrr_0.3.4 lattice_0.20-41
#> [7] colorspace_2.0-0 vctrs_0.3.6 generics_0.1.0
#> [10] htmltools_0.5.1.1 yaml_2.2.1 utf8_1.2.1
#> [13] rlang_0.4.10 jquerylib_0.1.3 pillar_1.5.1
#> [16] glue_1.4.2 withr_2.4.1 DBI_1.1.1
#> [19] rvcheck_0.1.8 lifecycle_1.0.0 stringr_1.4.0
#> [22] munsell_0.5.0 gtable_0.3.0 evaluate_0.14
#> [25] labeling_0.4.2 knitr_1.31 parallel_4.0.4
#> [28] fansi_0.4.2 highr_0.8 Rcpp_1.0.6
#> [31] scales_1.1.1 BiocManager_1.30.10 debugme_1.1.0
#> [34] jsonlite_1.7.2 farver_2.1.0 gridExtra_2.3
#> [37] digest_0.6.27 aplot_0.0.6 stringi_1.5.3
#> [40] dplyr_1.0.5 grid_4.0.4 tools_4.0.4
#> [43] magrittr_2.0.1 sass_0.3.1 lazyeval_0.2.2
#> [46] patchwork_1.1.1 tibble_3.1.0 tidyr_1.1.3
#> [49] crayon_1.4.1 ape_5.4-1 pkgconfig_2.0.3
#> [52] tidytree_0.3.3 ellipsis_0.3.1 assertthat_0.2.1
#> [55] rmarkdown_2.7 R6_2.5.0 nlme_3.1-152
#> [58] compiler_4.0.4
6. Reference
Wickham, Hadley. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org.
Yu, Guangchuang. 2020. “Using Ggtree to Visualize Data on Tree-Like Structures.” Current Protocols in Bioinformatics 69 (1): e96. https://doi.org/10.1002/cpbi.96.
Yu, Guangchuang, Tommy Tsan-Yuk Lam, Huachen Zhu, and Yi Guan. 2018. “Two Methods for Mapping and Visualizing Associated Data on Phylogeny Using Ggtree.” Molecular Biology and Evolution 35 (2): 3041–3. https://doi.org/10.1093/molbev/msy194.
Yu, Guangchuang, David Smith, Huachen Zhu, Yi Guan, and Tommy Tsan-Yuk Lam. 2017. “Ggtree: An R Package for Visualization and Annotation of Phylogenetic Trees with Their Covariates and Other Associated Data.” Methods in Ecology and Evolution 8 (1): 28–36. https://doi.org/10.1111/2041-210X.12628.