dc.contributor.author
Shao, Borong
dc.date.accessioned
2018-07-19T09:39:43Z
dc.date.available
2018-07-19T09:39:43Z
dc.identifier.uri
https://refubium.fu-berlin.de/handle/fub188/22487
dc.identifier.uri
http://dx.doi.org/10.17169/refubium-294
dc.description.abstract
Network-based feature selection methods on omics data have been developed in recent years. Their performance gain, however, is shown to be affected by the datasets, networks, and evaluation metrics. The reproducibility and robustness of biomarkers await to be improved. In this endeavor, one of the major challenges is the curse of dimensionality.
To mitigate this issue, we proposed the Phenotype Relevant Network-based Feature Selection (PRNFS) framework. By employing a much smaller but phenotype relevant network, we could avoid irrelevant information and select robust molecular signatures. The advantages of PRNFS were demonstrated with the application of lung cancer prognosis prediction. Specifically, we constructed epithelial mesenchymal transition (EMT) networks and employed them for feature selection. We mapped multiple types of omics data on it alternatively to select single-omics signatures and further integrated them into multi-omics signatures. Then we introduced a multiplex network-based feature selection method to directly select multi-omics signatures. Both single-omics and multi-omics EMT signatures were evaluated on TCGA data as well as an independent multi-omics dataset.
The results showed that EMT signatures achieved significant performance gain, although EMT networks covered less than 2.5% of the original data dimensions. Frequently selected EMT features achieved average AUC values of 0.83 on TCGA data. Employing EMT signatures on the independent dataset stratified the patients into significantly different prognostic groups. Multi-omics features showed superior performance over single-omics features on both TCGA data and the independent data.
Additionally, we tested the performance of a few relational and non-relational databases for storing and retrieving omics data. Since biological data have large volume, high velocity, and wide varieties, it is necessary to have database systems that meet the need of integrative omics data analysis. Based on the results, we provided a few advices on building scalable omics data infrastructures.
en
dc.format.extent
vi, 186 Seiten
de
dc.rights.uri
http://www.fu-berlin.de/sites/refubium/rechtliches/Nutzungsbedingungen
de
dc.subject
Feature selection
en
dc.subject
Data integration
en
dc.subject
Cancer prognosis
en
dc.subject
Epithelial Mesenchymal Transition
en
dc.subject
Survival analysis
en
dc.subject.ddc
000 Computer science, information, and general works::000 Computer Science, knowledge, systems::000 Computer science, information, and general works
de
dc.title
Phenotype Relevant Network-based Biomarker Discovery Integrating Multiple Omics Data
de
dc.contributor.gender
female
de
dc.contributor.firstReferee
Conrad, Tim
dc.contributor.furtherReferee
Klau, Gunnar
dc.date.accepted
2018-07-09
dc.identifier.urn
urn:nbn:de:kobv:188-refubium-22487-4
dc.title.subtitle
EMT Network-based Lung Cancer Prognosis Prediction
de
refubium.affiliation
Mathematik und Informatik
de
dcterms.accessRights.dnb
free
de
dcterms.accessRights.openaire
open access