dc.contributor.author
Rams, Mona
dc.date.accessioned
2022-11-10T11:38:33Z
dc.date.available
2022-11-10T11:38:33Z
dc.identifier.uri
https://refubium.fu-berlin.de/handle/fub188/36752
dc.identifier.uri
http://dx.doi.org/10.17169/refubium-36465
dc.description.abstract
The era of high-throughput data generation enables new access to biomolecular profiles and exploitation thereof. However, the analysis of such biomolecular data, for example, transcriptomic data, suffers from the so-called "curse of dimensionality". This occurs in the analysis of datasets with a significantly larger number of variables than data points. As a consequence, overfitting and unintentional learning of process-independent patterns can appear. This can lead to insignificant results in the application. A common way of counteracting this problem is the application of dimension reduction methods and subsequent analysis of the resulting low-dimensional representation that has a smaller number of variables.
In this thesis, two new methods for the analysis of transcriptomic datasets are introduced and evaluated. Our methods are based on the concepts of Dictionary learning, which is an unsupervised dimension reduction approach. Unlike many dimension reduction approaches that are widely applied for transcriptomic data analysis, Dictionary learning does not impose constraints on the components that are to be derived. This allows for great flexibility when adjusting the representation to the data. Further, Dictionary learning belongs to the class of sparse methods. The result of sparse methods is a model with few non-zero coefficients, which is often preferred for its simplicity and ease of interpretation. Sparse methods exploit the fact that the analysed datasets are highly structured. Indeed, a characteristic of transcriptomic data is particularly their structuredness, which appears due to the connection of genes and pathways, for example. Nonetheless, the application of Dictionary learning in medical data analysis is mainly restricted to image analysis. Another advantage of Dictionary learning is that it is an interpretable approach. Interpretability is a necessity in biomolecular data analysis to gain a holistic understanding of the investigated processes.
Our two new transcriptomic data analysis methods are each designed for one main task: (1) identification of subgroups for samples from mixed populations, and (2) temporal ordering of samples from dynamic datasets, also referred to as "pseudotime estimation". Both methods are evaluated on simulated and real-world data and compared to other methods that are widely applied in transcriptomic data analysis. Our methods convince through high performance and overall outperform the comparison methods.
en
dc.format.extent
l, 167 Seiten
dc.rights.uri
https://creativecommons.org/licenses/by-nc/4.0/
dc.subject
Dictionary learning
en
dc.subject
Transcriptomic
en
dc.subject
Machine learning
en
dc.subject
Applied Mathematics
en
dc.subject
Dimension reduction
en
dc.subject.ddc
500 Naturwissenschaften und Mathematik::510 Mathematik::519 Wahrscheinlichkeiten, angewandte Mathematik
dc.title
New approaches for unsupervised transcriptomic data analysis based on Dictionary learning
dc.contributor.gender
female
dc.contributor.firstReferee
Conrad, Tim
dc.contributor.furtherReferee
Renard, Bernhard
dc.date.accepted
2022-10-25
dc.identifier.urn
urn:nbn:de:kobv:188-refubium-36752-1
refubium.affiliation
Mathematik und Informatik
dcterms.accessRights.dnb
free
dcterms.accessRights.openaire
open access
dcterms.accessRights.proquest
accept