dc.contributor.author
Krenn, Mario
dc.contributor.author
Ai, Qianxiang
dc.contributor.author
Barthel, Senja
dc.contributor.author
Carson, Nessa
dc.contributor.author
Frei, Angelo
dc.contributor.author
Frey, Nathan C.
dc.contributor.author
Friederich, Pascal
dc.contributor.author
Gaudin, Théophile
dc.contributor.author
Gayle, Alberto Alexander
dc.contributor.author
Moosavi, Seyed Mohamad
dc.date.accessioned
2023-02-20T12:01:50Z
dc.date.available
2023-02-20T12:01:50Z
dc.identifier.uri
https://refubium.fu-berlin.de/handle/fub188/38009
dc.identifier.uri
http://dx.doi.org/10.17169/refubium-37725
dc.description.abstract
Artificial intelligence (AI) and machine learning (ML) are expanding in popularity for broad applications to challenging tasks in chemistry and materials science. Examples include the prediction of properties, the discovery of new reaction pathways, or the design of new molecules. The machine needs to read and write fluently in a chemical language for each of these tasks. Strings are a common tool to represent molecular graphs, and the most popular molecular string representation, Smiles, has powered cheminformatics since the late 1980s. However, in the context of AI and ML in chemistry, Smiles has several shortcomings—most pertinently, most combinations of symbols lead to invalid results with no valid chemical interpretation. To overcome this issue, a new language for molecules was introduced in 2020 that guarantees 100% robustness: SELF-referencing embedded string (Selfies). Selfies has since simplified and enabled numerous new applications in chemistry. In this perspective, we look to the future and discuss molecular string representations, along with their respective opportunities and challenges. We propose 16 concrete future projects for robust molecular representations. These involve the extension toward new chemical domains, exciting questions at the interface of AI and robust languages, and interpretability for both humans and machines. We hope that these proposals will inspire several follow-up works exploiting the full potential of molecular string representations for the future of AI in chemistry and materials science.
en
dc.format.extent
27 Seiten
dc.rights.uri
https://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subject
Artificial intelligence
en
dc.subject
machine learning
en
dc.subject
molecular string representations
en
dc.subject.ddc
000 Informatik, Informationswissenschaft, allgemeine Werke::000 Informatik, Wissen, Systeme::004 Datenverarbeitung; Informatik
dc.title
SELFIES and the future of molecular string representations
dc.type
Wissenschaftlicher Artikel
dcterms.bibliographicCitation.articlenumber
100588
dcterms.bibliographicCitation.doi
10.1016/j.patter.2022.100588
dcterms.bibliographicCitation.journaltitle
Patterns
dcterms.bibliographicCitation.number
9
dcterms.bibliographicCitation.volume
3
dcterms.bibliographicCitation.url
https://doi.org/10.1016/j.patter.2022.100588
refubium.affiliation
Mathematik und Informatik
refubium.affiliation.other
Institut für Mathematik
refubium.resourceType.isindependentpub
no
dcterms.accessRights.openaire
open access
dcterms.isPartOf.eissn
2666-3899
refubium.resourceType.provider
WoS-Alert