Integration of multi-omics data with graph convolutional networks to identify cancer-associated genes

Schulte-Sasse, Roman

Integration of multi-omics data with graph convolutional networks to identify cancer-associated genes

Metadaten

dc.contributor.author

Schulte-Sasse, Roman

dc.date.accessioned

2021-09-30T11:19:47Z

dc.date.available

2021-09-30T11:19:47Z

dc.date.issued

2021

dc.identifier.uri

https://refubium.fu-berlin.de/handle/fub188/31311

dc.identifier.uri

http://dx.doi.org/10.17169/refubium-31047

dc.description.abstract

Cancer is thought to arise from the accumulation of genetic changes in the DNA of the patient. Mutations can occur during replication of cells or from external factors. Given the current knowledge of gene regulation it is not yet possible to link cancer phenotypes directly to the genetic alterations. Despite the vast increase of available high-throughput molecular data, the in silico identification of disease genes for multi-factorial diseases such as cancer is still a challenging task. Perturbation of entire modules in cellular networks, and genetic, as well as non-genetic gene alternations, contribute to tumorigenesis. This necessitates the development of predictive models able to effectively integrate and process different data modalities. Most approaches cannot combine multi-dimensional molecular data with gene-gene interactions and the few methods that achieve that are hard to interpret. In this thesis, I introduce EMOGI, an explainable machine learning method based on Graph Convolutional Networks (GCNs) to predict cancer genes by combining multi-omics data, such as mutations, copy number changes, DNA methylation and gene expression profiles across different cancers, together with Protein-Protein Interaction (PPI) networks. By profiting from different data representations, EMOGI was more accurate than previous methods in predicting known cancer genes, with an average increase in area under the precision-recall curve of 3% – 37% across different PPI networks and data sets. We applied the Layer-Wise Relevance Propagation (LRP) technique to learn the molecular features that contributed to the classification of each individual cancer gene. We also identified relevant cancer modules in the PPI network, and stratified genes according to whether their classification was mainly driven by the interactome, mutation rate or alterations in either DNA methylation or gene expression. We propose a new high-confidence list of 165 putative novel cancer genes which do not harbour recurrent alterations, but rather participate in PPIs with well-known cancer drivers. We functionally validated those novel predictions with publicly available loss-of-function screens. We believe that our results might open new diagnostic and therapeutic avenues in precision oncology, and that our method can applied to predict biomarkers for other complex diseases.

dc.format.extent

ix, 190 Seiten

dc.language

eng

dc.rights.uri

http://www.fu-berlin.de/sites/refubium/rechtliches/Nutzungsbedingungen

dc.subject

cancer

dc.subject

machine learning

dc.subject

graph convolutional networks

dc.subject

bioinformatics

dc.subject

deep learning

dc.subject.ddc

000 Computer science, information, and general works::000 Computer Science, knowledge, systems::000 Computer science, information, and general works

dc.title

Integration of multi-omics data with graph convolutional networks to identify cancer-associated genes

dc.type

Dissertation

dcterms.format

Text

dc.contributor.gender

male

dc.contributor.firstReferee

Marsico, Annalisa

dc.contributor.furtherReferee

Markowetz, Florian

dc.date.accepted

2021-06-30

dc.identifier.urn

urn:nbn:de:kobv:188-refubium-31311-1

dc.title.translated

Integration von verschiedenen molekularen Daten mit Konvolutionsnetzen für Graphen verbessert die Identifikation von Krebs-assoziierten Genen

refubium.affiliation

Mathematik und Informatik

dcterms.accessRights.dnb

free

dcterms.accessRights.openaire

open access

dcterms.accessRights.proquest

Zur Kurzanzeige

Das Dokument erscheint in:

Dissertationen FU

Dateien zu dieser Ressource

thesis_roman_schultesasse.pdf

Größe: 38.22MB

Format: PDF

Prüfsumme (MD5): d9d12433934f3411425c7907bcb27428

Öffnen

Integration of multi-omics data with graph convolutional networks to identify cancer-associated genes

Refubium - Repositorium der Freien Universität Berlin

Integration of multi-omics data with graph convolutional networks to identify cancer-associated genes

Metadaten

Das Dokument erscheint in:

Dateien zu dieser Ressource

Metadaten exportieren