Interpretable Deep Learning Approaches for the Robust Identification of Peptidoforms in Mass Spectrometry-based Proteomics

Altenburg, Tom

Interpretable Deep Learning Approaches for the Robust Identification of Peptidoforms in Mass Spectrometry-based Proteomics

Metadata

dc.contributor.author

Altenburg, Tom

dc.date.accessioned

2025-04-17T12:11:29Z

dc.date.available

2025-04-17T12:11:29Z

dc.date.issued

2025

dc.identifier.uri

https://refubium.fu-berlin.de/handle/fub188/47319

dc.identifier.uri

http://dx.doi.org/10.17169/refubium-47037

dc.description.abstract

Mass spectrometry-based proteomics allows to take a snapshot of the state of cells on the protein level. In particular, it allows to study the entire proteome in a high-throughput manner with high sensitivity. It is furthermore able to provide insights into peptidoforms, which are peptides that have sequence variations or are post-translationally modified. These play a major role in cell signaling and gene regulation, but can also be linked to certain disorders. So, studying peptidoforms and proteoforms provides fine-grained phenotypic insights that are of high biological relevance. However, the identification of peptidoforms, using mass spectrometry-based proteomics, is challenging from at least two directions. First, the underlying tandem mass spectra contain additional, rare, and potentially hard-to-simulate patterns. Furthermore, contextual knowledge in the form of reference proteomes, which initially contain canonical reference proteins, cannot be extended indefinitely to account for all potential peptidoforms. This is because considering all possible combinations of modifications and sequence variations is not tractable due to combinatorial complexity. Hence, this thesis provides new deep learning methods that improve the identification of peptidoforms in mass spectrometry-based proteomics while being interpretable. First, I introduce AHLF as a new end-to-end trained deep learning model that predicts modified peptides based on their fragmentation tandem mass spectrum (MS/MS) spectra. I am able to show that AHLF’s prediction score boosts peptide identification rates in the context of phosphoproteomics and for cross-linked peptides. AHLF is a temporal convolutional neural network temporal convolutional neural network (TCN), which I trained on 19.2 million of historical peptide-to-spectrum matches. For interpreting AHLF, I estimate feature importances per peak and per spectrum. These peak-level importances show that AHLF is indeed focusing on peptide-specific fragment ions. Additionally, I investigate the prediction performance across varying quality of data, denoting that variation occurs due to instrument type, resolution and dissociation method. In the second part of this work, I present yHydra which is a foundation model that I trained on nearly 20 million peptides and MS/MS spectra. I designed yHydra as a foundation model to be able to implement various downstream sub-tasks. In particular, I demonstrate that yHydra is able to perform closed, open, and error-tolerant searching. Using yHydra, I demonstrate error-tolerant searching of antibody peptide sequences searching and in the context of cross-species searching of chimpanzee plasma samples. Lastly, I demonstrate the of explainability yHydra by visualizing the learned joint embedding of peptides and spectra which reveals a learned manifold that is structured in concordance of physico-chemical properties of embedded peptides.

dc.format.extent

vi, 129 Seiten

dc.language

eng

dc.rights.uri

https://creativecommons.org/licenses/by-nd/4.0/

dc.subject

Proteomics

dc.subject

Deep Learning

dc.subject

Machine Learning

dc.subject.ddc

500 Naturwissenschaften und Mathematik::500 Naturwissenschaften::500 Naturwissenschaften und Mathematik

dc.title

Interpretable Deep Learning Approaches for the Robust Identification of Peptidoforms in Mass Spectrometry-based Proteomics

dc.type

Dissertation

dcterms.format

Text

dc.contributor.gender

male

dc.contributor.firstReferee

Renard, Bernhard Y.

dc.contributor.furtherReferee

Wilhelm, Mathias

dc.date.accepted

2025-03-20

dc.identifier.urn

urn:nbn:de:kobv:188-refubium-47319-7

refubium.affiliation

Mathematik und Informatik

dcterms.accessRights.dnb

free

dcterms.accessRights.openaire

open access

Show Simple Item Record

This Item appears in the following Collection(s)

Dissertationen FU

Files in This Item

Dissertation_Altenburg.pdf

Size: 15.28MB

Format: PDF

Checksum (MD5): bf80ab7ea1c99cbb791f95b4ead1973f

View/Open

Interpretable Deep Learning Approaches for the Robust Identification of Peptidoforms in Mass Spectrometry-based Proteomics

Refubium - Freie Universität Berlin Repository

Interpretable Deep Learning Approaches for the Robust Identification of Peptidoforms in Mass Spectrometry-based Proteomics

Metadata

This Item appears in the following Collection(s)

Files in This Item

Export metadata