dc.contributor.author
Steiert, Daniel
dc.contributor.author
Wittig, Corey
dc.contributor.author
Banerjee, Priyanka
dc.contributor.author
Preissner, Robert
dc.contributor.author
Szulcek, Robert
dc.date.accessioned
2025-07-25T10:23:58Z
dc.date.available
2025-07-25T10:23:58Z
dc.identifier.uri
https://refubium.fu-berlin.de/handle/fub188/48362
dc.identifier.uri
http://dx.doi.org/10.17169/refubium-48084
dc.description.abstract
Background
In the modern era, the growth of scientific literature presents a daunting challenge for researchers to keep informed of advancements across multiple disciplines.
Objective
We apply natural language processing (NLP) and embedding learning concepts to design PubDigest, a tool that combs PubMed literature, aiming to pinpoint potential drugs that could be repurposed.
Methods
Using NLP, especially term associations through word embeddings, we explored unrecognized relationships between drugs and diseases. To illustrate the utility of PubDigest, we focused on chronic thromboembolic pulmonary hypertension (CTEPH), a rare disease with an overall limited number of scientific publications.
Results
Our literature analysis identified key clinical features linked to CTEPH by applying term frequency-inverse document frequency (TF-IDF) scoring, a technique measuring a term’s significance in a text corpus. This allowed us to map related diseases. One standout was venous thrombosis (VT), which showed strong semantic links with CTEPH. Looking deeper, we discovered potential repurposing candidates for CTEPH through large-scale neural network-based contextualization of literature and predictive modeling on both the CTEPH and the VT literature corpora to find novel, yet unrecognized associations between the two diseases. Alongside the anti-thrombotic agent caplacizumab, benzofuran derivatives were an intriguing find. In particular, the benzofuran derivative amiodarone displayed potential anti-thrombotic properties in the literature. Our in vitro tests confirmed amiodarone’s ability to reduce platelet aggregation significantly by 68% (p = 0.02). However, real-world clinical data indicated that CTEPH patients receiving amiodarone treatment faced a significant 15.9% higher mortality risk (p<0.001).
Conclusions
While NLP offers an innovative approach to interpreting scientific literature, especially for drug repurposing, it is crucial to combine it with complementary methods like in vitro testing and real-world evidence. Our exploration with benzofuran derivatives and CTEPH underscores this point. Thus, blending NLP with hands-on experiments and real-world clinical data can pave the way for faster and safer drug repurposing approaches, especially for rare diseases like CTEPH.
en
dc.rights.uri
https://creativecommons.org/licenses/by/4.0/
dc.subject
drug research and development
en
dc.subject
Natural Language Processing
en
dc.subject.ddc
600 Technik, Medizin, angewandte Wissenschaften::610 Medizin und Gesundheit::610 Medizin und Gesundheit
dc.title
An exploration into CTEPH medications: Combining natural language processing, embedding learning, in vitro models, and real-world evidence for drug repurposing
dc.type
Wissenschaftlicher Artikel
dcterms.bibliographicCitation.articlenumber
e1012417
dcterms.bibliographicCitation.doi
10.1371/journal.pcbi.1012417
dcterms.bibliographicCitation.journaltitle
PLOS Computational Biology
dcterms.bibliographicCitation.number
9
dcterms.bibliographicCitation.originalpublishername
Public Library of Science (PLoS)
dcterms.bibliographicCitation.volume
20
refubium.affiliation
Charité - Universitätsmedizin Berlin
refubium.resourceType.isindependentpub
no
dcterms.accessRights.openaire
open access
dcterms.bibliographicCitation.pmid
39264975
dcterms.isPartOf.eissn
1553-7358