Mass spectrometry-based proteomics allows to take a snapshot of the state of cells on the protein level. In particular, it allows to study the entire proteome in a high-throughput manner with high sensitivity. It is furthermore able to provide insights into peptidoforms, which are peptides that have sequence variations or are post-translationally modified. These play a major role in cell signaling and gene regulation, but can also be linked to certain disorders. So, studying peptidoforms and proteoforms provides fine-grained phenotypic insights that are of high biological relevance. However, the identification of peptidoforms, using mass spectrometry-based proteomics, is challenging from at least two directions. First, the underlying tandem mass spectra contain additional, rare, and potentially hard-to-simulate patterns. Furthermore, contextual knowledge in the form of reference proteomes, which initially contain canonical reference proteins, cannot be extended indefinitely to account for all potential peptidoforms. This is because considering all possible combinations of modifications and sequence variations is not tractable due to combinatorial complexity. Hence, this thesis provides new deep learning methods that improve the identification of peptidoforms in mass spectrometry-based proteomics while being interpretable. First, I introduce AHLF as a new end-to-end trained deep learning model that predicts modified peptides based on their fragmentation tandem mass spectrum (MS/MS) spectra. I am able to show that AHLF’s prediction score boosts peptide identification rates in the context of phosphoproteomics and for cross-linked peptides. AHLF is a temporal convolutional neural network temporal convolutional neural network (TCN), which I trained on 19.2 million of historical peptide-to-spectrum matches. For interpreting AHLF, I estimate feature importances per peak and per spectrum. These peak-level importances show that AHLF is indeed focusing on peptide-specific fragment ions. Additionally, I investigate the prediction performance across varying quality of data, denoting that variation occurs due to instrument type, resolution and dissociation method. In the second part of this work, I present yHydra which is a foundation model that I trained on nearly 20 million peptides and MS/MS spectra. I designed yHydra as a foundation model to be able to implement various downstream sub-tasks. In particular, I demonstrate that yHydra is able to perform closed, open, and error-tolerant searching. Using yHydra, I demonstrate error-tolerant searching of antibody peptide sequences searching and in the context of cross-species searching of chimpanzee plasma samples. Lastly, I demonstrate the of explainability yHydra by visualizing the learned joint embedding of peptides and spectra which reveals a learned manifold that is structured in concordance of physico-chemical properties of embedded peptides.