dc.contributor.author
Weckbecker, Moritz
dc.contributor.author
Anžel, Aleksandar
dc.contributor.author
Yang, Zewen
dc.contributor.author
Hattab, Georgesm
dc.date.accessioned
2024-07-03T11:34:40Z
dc.date.available
2024-07-03T11:34:40Z
dc.identifier.uri
https://refubium.fu-berlin.de/handle/fub188/44087
dc.identifier.uri
http://dx.doi.org/10.17169/refubium-43797
dc.description.abstract
Molecular encodings and their usage in machine learning models have demonstrated significant breakthroughs in biomedical applications, particularly in the classification of peptides and proteins. To this end, we propose a new encoding method: Interpretable Carbon-based Array of Neighborhoods (iCAN). Designed to address machine learning models' need for more structured and less flexible input, it captures the neighborhoods of carbon atoms in a counting array and improves the utility of the resulting encodings for machine learning models. The iCAN method provides interpretable molecular encodings and representations, enabling the comparison of molecular neighborhoods, identification of repeating patterns, and visualization of relevance heat maps for a given data set. When reproducing a large biomedical peptide classification study, it outperforms its predecessor encoding. When extended to proteins, it outperforms a lead structure-based encoding on 71% of the data sets. Our method offers interpretable encodings that can be applied to all organic molecules, including exotic amino acids, cyclic peptides, and larger proteins, making it highly versatile across various domains and data sets. This work establishes a promising new direction for machine learning in peptide and protein classification in biomedicine and healthcare, potentially accelerating advances in drug discovery and disease diagnosis.
en
dc.format.extent
11 Seiten
dc.rights.uri
https://creativecommons.org/licenses/by/4.0/
dc.subject
Interpretable
en
dc.subject
Molecular encoding
en
dc.subject
Representation
en
dc.subject
Machine learning
en
dc.subject.ddc
500 Naturwissenschaften und Mathematik::570 Biowissenschaften; Biologie::570 Biowissenschaften; Biologie
dc.title
Interpretable molecular encodings and representations for machine learning tasks
dc.type
Wissenschaftlicher Artikel
dcterms.bibliographicCitation.doi
10.1016/j.csbj.2024.05.035
dcterms.bibliographicCitation.journaltitle
Computational and Structural Biotechnology Journal
dcterms.bibliographicCitation.pagestart
2326
dcterms.bibliographicCitation.pageend
2336
dcterms.bibliographicCitation.volume
23
dcterms.bibliographicCitation.url
https://doi.org/10.1016/j.csbj.2024.05.035
refubium.affiliation
Mathematik und Informatik
refubium.affiliation.other
Institut für Mathematik

refubium.resourceType.isindependentpub
no
dcterms.accessRights.openaire
open access
dcterms.isPartOf.eissn
2001-0370
refubium.resourceType.provider
WoS-Alert