This is a demo for using the GeRnika
R package. This
document contains examples to help any user to understand the usage of
the functionalities offered by the package, which include the simulation
of tumor clonal data and the visualization and comparison of tumor
phylogenies.
The simulation of tumor clonal data consists of simulating a matrix \(\boldsymbol{F}\) that contains the mutation frequency values of a series of mutations in a collection of tumor biopsies or samples. The matrix \(\boldsymbol{F}\) is calculated as the product of a matrix \(\boldsymbol{B}\) that represents the phylogeny of the tumor, and a matrix \(\boldsymbol{U}\), which contains the clone proportions in each particular sample of that tumor.
Tumor data can be simulated through the create_instance
function. The information about its parameters and their usage may be
checked in the following table:
Parameter | Description | Type |
---|---|---|
n |
Number of mutations/clones (\(n\)). | Natural number |
m |
Number of samples (\(s\)). | Natural number |
k |
Topology parameter that controls for the linearity of the topology. | Positive rational number |
selection |
Evolution model followed by the tumor. | Categorical: “positive” or “neutral” |
noisy |
Add sequencing noise to the values in the \(F\) matrix. | Boolean |
depth |
(only if noise = TRUE ) Average number of
reads that map to the same locus, also known as sequencing depth. |
Natural number |
seed |
Seed for the pseudo-random topology generator | Real number |
The following is an example of the generation of a noise-free
instance of a tumor that is composed of 5 clones/mutations that has
evolved under neutral evolution and has a k
value of 0.5. 5
samples have been taken from it:
I <- create_instance(n = 5, m = 4, k = 0.5, selection = "neutral", seed = 1)
This method returns the previously mentioned \(\boldsymbol{F}\), \(\boldsymbol{B}\) and \(\boldsymbol{U}\) matrices and an additional \(\boldsymbol{F_{true}}\) matrix, which we will describe later.
Once we have shown an example of the instantiation of a tumor, we will analyze the effect of varying the values of the parameters.
k
k
is the parameter that controls for the linearity of
the topology of the tumor. As a result, increasing values of
k
lead to rather branched phylogenies, while lower values
of k
produce trees that tend to linearity.
# Simulate a tumor with k=0:
I1 <- create_instance(n = 5, m = 4, k = 0, selection = "neutral", seed = 1)
# Simulate a tumor with k=1:
I2 <- create_instance(n = 5, m = 4, k = 8, selection = "neutral", seed = 1)
# Create a `Phylotree` class object for each tumor:
tree1 <- B_to_phylotree(B = I1$B)
tree2 <- B_to_phylotree(B = I2$B)
# Plot both trees
DiagrammeR::render_graph(DiagrammeR::combine_graphs(data.tree::ToDiagrammeRGraph(tree1@tree), data.tree::ToDiagrammeRGraph(tree2@tree)))
Following the above, the tree on the left is fully branched as it is composed by a root connected to all the leaves of the tree. On the right side we can see a more linear tree, with just two main branches.
After analyzing the effect of parameter k
in the
generation of tumor data, we will proceed to analyze the effect of the
tumor evolution model.
With this parameter we control for the evolution model the tumor follows, either positive selection or neutral evolution. A positive selection-driven evolution model assumes that a few mutations provide cell growth advantage, whereas the remaining mutations do not. Conversely, neutral evolution models assume that, in general, none of the mutations provide significant fitness advantage. As a consequence, tumors under positive selection are dominated by a few clones whereas the rest of clones are present in small proportions. Instead, in tumors with a neutral evolution, all the clones are present in similar proportions.
The clone proportions are well observed in the \(\boldsymbol{U}\) matrix, as this indeed contains the fraction of each clone in each sample of the tumor. Below, we can see this effect with a few examples:
# Function to create the heatmap of the U matrix
U_to_heatmap <- function(U, values = TRUE, col_names = c("samples", "clones", "proportion")){
Upos <-reshape2::melt(U)
colnames(Upos) <- col_names
Var1 <- col_names[1]
Upos<-ggplot(Upos, aes(x = samples, y = clones, fill = proportion)) + geom_tile(col = "black") +
theme(
panel.background = element_rect(fill = 'transparent'),
plot.background = element_rect(fill = 'transparent', color = NA),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
legend.background = element_rect(fill = 'transparent'),
legend.box.background = element_rect(fill = 'transparent')
) + scale_fill_gradient(limits = c(0.0000000000001, 1)) + theme(axis.title.x = element_blank(),
axis.ticks.x = element_blank(),
axis.title.y = element_blank(),
axis.ticks.y = element_blank())
if(values){
Upos <- Upos + geom_text(aes(label = proportion), size = 4)
}
Upos
}
# simulate a tumor with neutral selection:
Ipos <- create_instance(n = 5, m = 8, k = 0.5, selection = "positive", seed = 1)
# simulate a tumor with positive selection:
Ineu <- create_instance(n = 5, m = 8, k = 0.5, selection = "neutral", seed = 1)
Upos <- U_to_heatmap(Ipos$U)
Uneu <- U_to_heatmap(Ineu$U)
ggarrange(plotlist = list(Upos, Uneu), ncol = 1, nrow = 2)
In that figure, we may see that in the tumor instance that evolves under neutral evolution, the majority of clones are present in all the samples, with only a few exceptions. Most of the clones have fraction values between 0.05 and 0.4. Conversely, the biggest fraction of the tumor instance under positive selection is taken by a few clones, in particular, by clone 3, and by clones 1 and 2 in a lower proportion. In addition, more clones than in the neutral evolution instance are absent; for instance, clone 5 is even missing in all the samples.
Once we have analyzed the difference between the neutral evolution and positive selection-driven evolution models, we will show how can noisy instances be generated and compare them to noise-free instances.
GeRnika
has a functionality to add sequencing noise to
the simulated instances. In practice, this is done by adding a few
parameters to the create_instance
function. Sequencing
noise is added on top of the noise-free \(\boldsymbol{F}\) matrix, and the \(U\) and \(\boldsymbol{B}\) matrices suffer no
changes. The F
slot in the resulting object contains the
noisy \(\boldsymbol{F}\) matrix, while
the F_true
slot consists of the original noise-free \(\boldsymbol{F}\) matrix. Note that these
two slots contain equal matrices when no sequencing noise is added.
Down below we compare an instance we have added sequencing noise to, with its noise-free counterpart.
# Function to create a heatmap of F
F_to_heatmap <- function(U, values = TRUE, col_names = c("samples", "mutations", "VAF")){
Upos <-reshape2::melt(U)
colnames(Upos) <- col_names
Var1 <- col_names[1]
Upos<-ggplot(Upos, aes(x = samples, y = mutations, fill = VAF)) + geom_tile(col = "black") +
theme(
panel.background = element_rect(fill = 'transparent'),
plot.background = element_rect(fill = 'transparent', color = NA),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
legend.background = element_rect(fill = 'transparent'),
legend.box.background = element_rect(fill = 'transparent')
) + scale_fill_gradient(limits = c(0.00000000000000000001, 1)) + theme(axis.title.x = element_blank(),
axis.ticks.x = element_blank(),
axis.title.y = element_blank(),
axis.ticks.y = element_blank())
if (values) {
Upos <- Upos + geom_text(aes(label = round(VAF, digits = 2)), size = 4)
}
Upos
}
# Simulate a tumor with sequencing noise added:
Inoisy <- create_instance(m = 5, n = 8, k = 0.5, selection = "neutral", noisy = TRUE, depth = 5, seed = 1)
Fnoise <- F_to_heatmap(abs(Inoisy$F - Inoisy$F_true))
ggarrange(Fnoise)
The heatmap from above shows the difference between the \(\boldsymbol{F}\) matrix and the \(\boldsymbol{F_{true}}\) matrix of a tumor instance, i.e. the noise added to the original VAF values of our samples. The amount of noise added is controlled by the depth parameter, which replicates the effect that the sequencing depth has on the noise level. This is explained in more detail in the following subsection.
Finally, the depth
parameter represents the sequencing
read depth, that is, the average number of reads that map to the same
locus (section of the genome). Therefore, the higher the sequencing
depth, the most accurate the VAF values and thus, the lower the noise
will be.
After analyzing the parameters for simulating tumor clonal data, we
will present the basis of the Phylotree
S4 class.
Phylotree
S4 classIn this section, we will be using the Phylotree
class
for the purpose of visualizing phylogenetic trees on the basis of
simulated tumor clonal data. The Phylotree
S4 class is a
structure that provides facilities for constructing tumor phylogenetic
trees. As every S4 R class, the Phylotree
class is composed
of various attributes that are essential for building a tumor
phylogenetic tree in an eficient way. Beware that the users of this
package will not need to manipulate the parameters of the
Phylotree
class as some of these exist only to reduce the
computational cost of the visualization of phylogenetic trees and their
comparison.
Phylotree
class objectsThe mutations of each clone in a tumor sample are represented in an
n
x n
binary genotype \(\boldsymbol{B}\) matrix. Each \(b_{i}\) row of the \(\boldsymbol{B}\) matrix represents the
mutations in clone \(v_{i}\). Note that
we just need a \(\boldsymbol{B}\)
matrix of a tumor simulation in order to instantiate a new
Phylotree
class object.
Now, we will show an example of the instantiation of a
Phylotree
class object:
# Simulate a tumor with 5 clones, 4 samples, k=0.5:
instance <- create_instance(n = 5, m = 4, k = 0.5, selection = "neutral", noisy = FALSE, seed = 1)
# The creation of the Phylotree class object using the previously generated B matrix:
phylotree <- B_to_phylotree(B = instance$B)
The B_to_phylotree
method takes the \(\boldsymbol{B}\) matrix of a tumor
simulation as its main argument and calculates the values for the other
attributes present in Phylotree
class objects. After
instantiating the new Phylotree
object, we can visualize it
using the generic plot
method for this class:
plot(phylotree)
Instead of clone numbers, the user can use any other labelling for the nodes in the tree. The example below illustrates how this can be done:
# Create a vector with the tags we want to use (the length of the
# vector must be equal to the number of clones in the phylogenetic tree):
tags <- c("mut1", "mut2", "mut3", "mut4", "mut5")
# Create the Phylotree class object and include the node labels:
phylotree <- B_to_phylotree(B = instance$B, labels = tags)
After creating the Phylotree
class object, we may render
it with the custom labels in the following way:
plot(phylotree, labels = TRUE)
Users are encouraged to use only the B_to_phylotree
method for instantiating Phylotree
class objects, as the
parameters of the class must fulfill various restrictions to work
properly.
When plotting a Phylotree
class object, its nodes can be
resized according to some given proportions of the clones that compose
the tumor samples. To do this, the plot_proportions
function takes a phylotree
class object and an additional
\(\boldsymbol{U}\) matrix determining
the proportions of the clones.
# Simulate a tumor
instance <- create_instance(n = 5, m = 4, k = 0.5, selection = "neutral", noisy = FALSE, seed = 1)
# Use the B matrix to instantiate a Phylotree class object
tree <- B_to_phylotree(B = instance$B)
# Plot the phylogenetic tree and resize the nodes according to the U matrix
plot_proportions(tree, instance$U)
Again, in this case we can also use custom labels for the nodes:
tags <- c("A", "B", "C", "D", "E")
# Simulate a tumor
instance <- create_instance(n = 5, m = 4, k = 0.5, selection = "neutral", noisy = FALSE, seed = 1)
# Use the B matrix of the previously generated tumor for creating a Phylotree class object
tree<-B_to_phylotree(B = instance$B, labels = tags)
# Plot the phylogenetic tree, resize the nodes according to the U matrix and include the node labels
plot_proportions(tree, instance$U, labels = TRUE)
Note that the proportions
parameter can be a matrix or a
vector. If a matrix is given to the function, this method will generate
one plot per row in the matrix. Conversely, if a vector is given, a
single plot will be generated.
Once we have shown the usage of the methods for instantiating
Phylotree
class objects and the procedures these can be
visualized by, we will present the functions for comparing different
tumor phylogenies.
GeRnika
presents different functionalities for comparing
tumor phylogenies. In order to show them, we will use the
B_mats
object included in the GeRnika
package.
This contains 10 \(\boldsymbol{B}\)
matrix trios. Each trio corresponds to a different instance, and the
\(\boldsymbol{B}\) matrices arise in
the course of solving the Clonal Deconvolution and Evolution Problem
(CDEP) for that instance, as follows:
These matrices are based on the solution of various instances of the Clonal Deconvolution and Evolution Problem given by the ILS and GRASP methods. This trios consist of the following slots:
B_true
: The real \(\boldsymbol{B}\) matrix of a tumor.B_Grasp
: A \(\boldsymbol{B}\) matrix generated througha
greedy, randomized adaptive heuristic strategy (GRASP) given an observed
F matrix for the \(\boldsymbol{B_{true}}\) matrixB_opt
: The optimal \(\boldsymbol{B}\) matrix found by an
iterated local search approach given an observed \(\boldsymbol{F}\) matrix for the \(\boldsymbol{B_{true}}\) matrix.The example below loads and plots the 3 \(\boldsymbol{B}\) matrices of an instance in
B_mats
:
# Load the predefined B matrices of the package:
B_mats <- GeRnika::B_mats
# Get one of the B matrix trios to compare them:
B_real <- B_mats[[2]]$B_real
B_opt <- B_mats[[2]]$B_opt
B_grasp <- B_mats[[2]]$B_grasp
#Create the list of the tags for the clones that compose the phylogenetic trees:
tags <- c("mut1", "mut2", "mut3", "mut4", "mut5", "mut6",
"mut7", "mut8", "mut9", "mut10")
#Create the Phylotree class objects using the previously loaded B matrices:
phylotree_real <- B_to_phylotree(B_real, labels = tags)
phylotree_opt <- B_to_phylotree(B_opt, labels = tags)
phylotree_grasp <- B_to_phylotree(B=B_grasp, labels=tags)
#Render both trees:
DiagrammeR::render_graph(DiagrammeR::combine_graphs(data.tree::ToDiagrammeRGraph(phylotree_real@tree),DiagrammeR::combine_graphs(data.tree::ToDiagrammeRGraph(phylotree_grasp@tree),data.tree::ToDiagrammeRGraph(phylotree_opt@tree))))
B_mats
.
As these trees above relate to the same tumor instance, it is
reasonable that they may present some commonalities. Now, we will show
the three different methods offered by the GeRnika
package
for comparing tumor phylogenies.
equals
methodThe three phylogenetic trees above are not equal. For example,
phylotree_real
and phylotree_opt
are not equal
as some of the edges of phylotree_real
do not exist in
phylotree_opt
and the other way around.
The equivalence between two phylogenetic trees may be checked by
using the equals
method as follows:
# Checking if phylotree_real is equal to itself:
equals(phylotree_1 = phylotree_real, phylotree_2 = phylotree_real)
#> [1] TRUE
# Checking if phylotree_real and phylotree_opt are equal:
equals(phylotree_1 = phylotree_real, phylotree_2 = phylotree_opt)
#> [1] FALSE
Equal phylogenetic trees, by definition, are composed by the same
nodes, connected by the same edges. As a result, this method returns
TRUE
when phylotree_real
is compared with
itself. Likewise, as phylotree_real
and
phylotree_opt
are not equal, this method returns
FALSE
when we check whether they are equal or not.
Nevertheless, even though two trees may not be equal, they may have common subtrees.
find_common_subtrees
methodThe find_common_subtrees
function calculates and plots
all the maximal common subtrees between two phylogenetic trees.
Moreover, it also outputs the number of common and independent (the ones
not shared by both trees) edges of the trees. Finally, the method
outputs the distance between the trees, which is computed as the sum of
their independent edges.
find_common_subtrees(phylotree_1 = phylotree_real, phylotree_2 = phylotree_grasp)
#> Independent edges of tree1: 6
#> Independent edges of tree2: 6
#> Common edges: 3
#> Distance: 12
phylotree_real
and
phylotree_grasp
.
find_common_subtrees(phylotree_1 = phylotree_real, phylotree_2 = phylotree_opt)
#> Independent edges of tree1: 3
#> Independent edges of tree2: 3
#> Common edges: 6
#> Distance: 6
phylotree_real
and
phylotree_opt
.
We observe that there are two maximal common subtrees between
phylotree_real
and phylotree_grasp
.
Contrariwise, phylotree_real
and phylotree_opt
present a single maximal common subtree that covers the biggest part of
both trees.
As before, we can add custom labels to the trees that result from this function call:
find_common_subtrees(phylotree_1 = phylotree_real, phylotree_2 = phylotree_grasp, labels = TRUE)
#> Independent edges of tree1: 6
#> Independent edges of tree2: 6
#> Common edges: 3
#> Distance: 12
phylotree_real
and
phylotree_grasp
labelled using custom tags.
find_common_subtrees(phylotree_1 = phylotree_real, phylotree_2 = phylotree_opt, labels = TRUE)
#> Independent edges of tree1: 3
#> Independent edges of tree2: 3
#> Common edges: 6
#> Distance: 6
phylotree_real
and
phylotree_opt
labelled using custom tags.
According to the plots from above, we conclude that
phylotree_real
shares more edges with
phylotree_opt
than it does with
phylotree_grasp
.
GeRnika
offers an additional method to compare two
phylogenetic trees: a consensus tree that shows the union of the nodes
and edges of both trees.
The combine_trees
function creates a tree that results
from the union of the nodes and edges of two trees. We name this the
consensus tree. In this, the nodes and the edges that compose the common
subtrees between the original trees are blue. In addition, yellow edges
denote to the independent edges of the tree passed as the first
parameter of the method, while orange edges represent the independent
edges of the second tree. Note that the independent edges of both trees
are presented with more translucent colors.
# Creating the consensus tree between phylotree_real and phylotree_grasp
consensus_real_grasp <- combine_trees(phylotree_1 = phylotree_real, phylotree_2 = phylotree_grasp)
# Creating the consensus tree between phylotree_real and phylotree_opt
consensus_real_opt <- combine_trees(phylotree_1 = phylotree_real, phylotree_2 = phylotree_opt)
The consensus tree can be plotted by using the function
render_graph
of the DiagrammeR
R package.
# Rendering the consensus between phylotree_real and phylotree_grasp
DiagrammeR::render_graph(consensus_real_grasp)
phylotree_real
and
phylotree_grasp
.
# Rendering the consensus between phylotree_real and phylotree_opt
DiagrammeR::render_graph(consensus_real_opt)
phylotree_real
and
phylotree_opt
.
The above figures present the consensus tree between
phylotree_real
and phylotree_opt
and the
consensus tree between phylotree_real
and
phylotree_grasp
, respectively.
Users can customize both the labels and the colors of the nodes. The
latter can be done by providing a custom palette –a vector containing
the hexadecimal code of various colors– of 3 elements such as
c("#AAAAAAAA","#BBBBBBBB","#CCCCCCCC")
. Additionally, the
user can use one of the color palettes included in GeRnika:
Lancet, NEJM and Simpsons (default).
# Load one of the default palettes of the package:
palette <- GeRnika::palettes$Lancet
# Create the consensus tree between phylotree_real and phylotree_opt
# using clone tags and the custom palette:
consensus <- combine_trees(phylotree_1 = phylotree_real, phylotree_2 = phylotree_opt,
labels = TRUE, palette = palette)
# Render the new consensus phylogenetic tree:
DiagrammeR::render_graph(consensus)
phylotree_real
and
phylotree_opt
using tags and a selected color palette.
This is the information of the system this document was compiled on:
sessionInfo(package = NULL)
#> R version 4.4.1 (2024-06-14 ucrt)
#> Platform: x86_64-w64-mingw32/x64
#> Running under: Windows 10 x64 (build 19045)
#>
#> Matrix products: default
#>
#>
#> locale:
#> [1] LC_COLLATE=C LC_CTYPE=Spanish_Spain.utf8
#> [3] LC_MONETARY=Spanish_Spain.utf8 LC_NUMERIC=C
#> [5] LC_TIME=Spanish_Spain.utf8
#>
#> time zone: Europe/Madrid
#> tzcode source: internal
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] knitr_1.48 ggpubr_0.6.0 ggplot2_3.5.1 GeRnika_1.0.0
#>
#> loaded via a namespace (and not attached):
#> [1] gtable_0.3.5 xfun_0.47 bslib_0.8.0
#> [4] htmlwidgets_1.6.4 visNetwork_2.1.2 rstatix_0.7.2
#> [7] vctrs_0.6.5 tools_4.4.1 generics_0.1.3
#> [10] tibble_3.2.1 fansi_1.0.6 highr_0.11
#> [13] RefManageR_1.4.0 pkgconfig_2.0.3 RColorBrewer_1.1-3
#> [16] lifecycle_1.0.4 compiler_4.4.1 farver_2.1.2
#> [19] stringr_1.5.1 munsell_0.5.1 data.tree_1.1.0
#> [22] knitcitations_1.0.12 carData_3.0-5 htmltools_0.5.8.1
#> [25] sass_0.4.9 yaml_2.3.10 pillar_1.9.0
#> [28] car_3.1-2 jquerylib_0.1.4 tidyr_1.3.1
#> [31] cachem_1.1.0 abind_1.4-5 tidyselect_1.2.1
#> [34] digest_0.6.37 stringi_1.8.4 dplyr_1.1.4
#> [37] reshape2_1.4.4 purrr_1.0.2 labeling_0.4.3
#> [40] cowplot_1.1.3 bibtex_0.5.1 fastmap_1.2.0
#> [43] grid_4.4.1 colorspace_2.1-1 cli_3.6.3
#> [46] magrittr_2.0.3 DiagrammeR_1.0.11 utf8_1.2.4
#> [49] broom_1.0.6 withr_3.0.1 scales_1.3.0
#> [52] backports_1.5.0 lubridate_1.9.3 timechange_0.3.0
#> [55] rmarkdown_2.28 httr_1.4.7 ggsignif_0.6.4
#> [58] evaluate_0.24.0 rlang_1.1.4 Rcpp_1.0.13
#> [61] glue_1.7.0 xml2_1.3.6 rstudioapi_0.16.0
#> [64] jsonlite_1.8.8 R6_2.5.1 plyr_1.8.9