dc.contributor.author
Schulte-Sasse, Roman
dc.date.accessioned
2021-09-30T11:19:47Z
dc.date.available
2021-09-30T11:19:47Z
dc.identifier.uri
https://refubium.fu-berlin.de/handle/fub188/31311
dc.identifier.uri
http://dx.doi.org/10.17169/refubium-31047
dc.description.abstract
Cancer is thought to arise from the accumulation of genetic changes in the DNA of the patient. Mutations can occur during replication of cells or from external factors. Given the current knowledge of gene regulation it is not yet possible to link cancer phenotypes directly to the genetic alterations. Despite the vast increase of available high-throughput molecular data, the in silico identification of disease genes for multi-factorial diseases such as cancer is still a challenging task. Perturbation of entire modules in cellular networks, and genetic, as well as non-genetic gene alternations, contribute to tumorigenesis. This necessitates the development of predictive models able to effectively integrate and process different data modalities. Most approaches cannot combine multi-dimensional molecular data with gene-gene interactions and the few methods that achieve that are hard to interpret.
In this thesis, I introduce EMOGI, an explainable machine learning method based on Graph Convolutional Networks (GCNs) to predict cancer genes by combining multi-omics data, such as mutations, copy number changes, DNA methylation and gene expression profiles across different cancers, together with Protein-Protein Interaction (PPI) networks. By profiting from different data representations, EMOGI was more accurate than previous methods in predicting known cancer genes, with an average increase in area under the precision-recall curve of 3% – 37% across different PPI networks and data sets. We applied the Layer-Wise Relevance Propagation (LRP) technique to learn the molecular features that contributed to the classification of each individual cancer gene. We also identified relevant cancer modules in the PPI network, and stratified genes according to whether their classification was mainly driven by the interactome, mutation rate or alterations in either DNA methylation or gene expression. We propose a new high-confidence list of 165 putative novel cancer genes which do not harbour recurrent alterations, but rather participate in PPIs with well-known cancer drivers. We functionally validated those novel predictions with publicly available loss-of-function screens. We believe that our results might open new diagnostic and therapeutic avenues in precision oncology, and that our method can applied to predict biomarkers for other complex diseases.
en
dc.format.extent
ix, 190 Seiten
dc.rights.uri
http://www.fu-berlin.de/sites/refubium/rechtliches/Nutzungsbedingungen
dc.subject
machine learning
en
dc.subject
graph convolutional networks
en
dc.subject
bioinformatics
en
dc.subject
deep learning
en
dc.subject.ddc
000 Computer science, information, and general works::000 Computer Science, knowledge, systems::000 Computer science, information, and general works
dc.title
Integration of multi-omics data with graph convolutional networks to identify cancer-associated genes
dc.contributor.gender
male
dc.contributor.firstReferee
Marsico, Annalisa
dc.contributor.furtherReferee
Markowetz, Florian
dc.date.accepted
2021-06-30
dc.identifier.urn
urn:nbn:de:kobv:188-refubium-31311-1
dc.title.translated
Integration von verschiedenen molekularen Daten mit Konvolutionsnetzen für Graphen verbessert die Identifikation von Krebs-assoziierten Genen
de
refubium.affiliation
Mathematik und Informatik
dcterms.accessRights.dnb
free
dcterms.accessRights.openaire
open access
dcterms.accessRights.proquest
accept