对 CTEPH 药物的探索:将自然语言处理、嵌入式学习、体外模型和真实世界证据相结合,促进药物再利用

IF 3.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS
Daniel Steiert, Corey Wittig, Priyanka Banerjee, Robert Preissner, Robert Szulcek
{"title":"对 CTEPH 药物的探索:将自然语言处理、嵌入式学习、体外模型和真实世界证据相结合,促进药物再利用","authors":"Daniel Steiert, Corey Wittig, Priyanka Banerjee, Robert Preissner, Robert Szulcek","doi":"10.1371/journal.pcbi.1012417","DOIUrl":null,"url":null,"abstract":"Background In the modern era, the growth of scientific literature presents a daunting challenge for researchers to keep informed of advancements across multiple disciplines. Objective We apply natural language processing (NLP) and embedding learning concepts to design PubDigest, a tool that combs PubMed literature, aiming to pinpoint potential drugs that could be repurposed. Methods Using NLP, especially term associations through word embeddings, we explored unrecognized relationships between drugs and diseases. To illustrate the utility of PubDigest, we focused on chronic thromboembolic pulmonary hypertension (CTEPH), a rare disease with an overall limited number of scientific publications. Results Our literature analysis identified key clinical features linked to CTEPH by applying term frequency-inverse document frequency (TF-IDF) scoring, a technique measuring a term’s significance in a text corpus. This allowed us to map related diseases. One standout was venous thrombosis (VT), which showed strong semantic links with CTEPH. Looking deeper, we discovered potential repurposing candidates for CTEPH through large-scale neural network-based contextualization of literature and predictive modeling on both the CTEPH and the VT literature corpora to find novel, yet unrecognized associations between the two diseases. Alongside the anti-thrombotic agent caplacizumab, benzofuran derivatives were an intriguing find. In particular, the benzofuran derivative amiodarone displayed potential anti-thrombotic properties in the literature. Our <jats:italic>in vitro</jats:italic> tests confirmed amiodarone’s ability to reduce platelet aggregation significantly by 68% (p = 0.02). However, real-world clinical data indicated that CTEPH patients receiving amiodarone treatment faced a significant 15.9% higher mortality risk (p&lt;0.001). Conclusions While NLP offers an innovative approach to interpreting scientific literature, especially for drug repurposing, it is crucial to combine it with complementary methods like <jats:italic>in vitro</jats:italic> testing and real-world evidence. Our exploration with benzofuran derivatives and CTEPH underscores this point. Thus, blending NLP with hands-on experiments and real-world clinical data can pave the way for faster and safer drug repurposing approaches, especially for rare diseases like CTEPH.","PeriodicalId":20241,"journal":{"name":"PLoS Computational Biology","volume":null,"pages":null},"PeriodicalIF":3.8000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An exploration into CTEPH medications: Combining natural language processing, embedding learning, in vitro models, and real-world evidence for drug repurposing\",\"authors\":\"Daniel Steiert, Corey Wittig, Priyanka Banerjee, Robert Preissner, Robert Szulcek\",\"doi\":\"10.1371/journal.pcbi.1012417\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background In the modern era, the growth of scientific literature presents a daunting challenge for researchers to keep informed of advancements across multiple disciplines. Objective We apply natural language processing (NLP) and embedding learning concepts to design PubDigest, a tool that combs PubMed literature, aiming to pinpoint potential drugs that could be repurposed. Methods Using NLP, especially term associations through word embeddings, we explored unrecognized relationships between drugs and diseases. To illustrate the utility of PubDigest, we focused on chronic thromboembolic pulmonary hypertension (CTEPH), a rare disease with an overall limited number of scientific publications. Results Our literature analysis identified key clinical features linked to CTEPH by applying term frequency-inverse document frequency (TF-IDF) scoring, a technique measuring a term’s significance in a text corpus. This allowed us to map related diseases. One standout was venous thrombosis (VT), which showed strong semantic links with CTEPH. Looking deeper, we discovered potential repurposing candidates for CTEPH through large-scale neural network-based contextualization of literature and predictive modeling on both the CTEPH and the VT literature corpora to find novel, yet unrecognized associations between the two diseases. Alongside the anti-thrombotic agent caplacizumab, benzofuran derivatives were an intriguing find. In particular, the benzofuran derivative amiodarone displayed potential anti-thrombotic properties in the literature. Our <jats:italic>in vitro</jats:italic> tests confirmed amiodarone’s ability to reduce platelet aggregation significantly by 68% (p = 0.02). However, real-world clinical data indicated that CTEPH patients receiving amiodarone treatment faced a significant 15.9% higher mortality risk (p&lt;0.001). Conclusions While NLP offers an innovative approach to interpreting scientific literature, especially for drug repurposing, it is crucial to combine it with complementary methods like <jats:italic>in vitro</jats:italic> testing and real-world evidence. Our exploration with benzofuran derivatives and CTEPH underscores this point. Thus, blending NLP with hands-on experiments and real-world clinical data can pave the way for faster and safer drug repurposing approaches, especially for rare diseases like CTEPH.\",\"PeriodicalId\":20241,\"journal\":{\"name\":\"PLoS Computational Biology\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":3.8000,\"publicationDate\":\"2024-09-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"PLoS Computational Biology\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1371/journal.pcbi.1012417\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLoS Computational Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1371/journal.pcbi.1012417","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

摘要

背景 在现代社会,科学文献的增长为研究人员了解多个学科的进展带来了严峻的挑战。目的 我们运用自然语言处理(NLP)和嵌入式学习概念设计了 PubDigest,这是一种梳理 PubMed 文献的工具,旨在找出可重新利用的潜在药物。方法 利用 NLP,特别是通过词嵌入的术语关联,我们探索了药物与疾病之间未被认识到的关系。为了说明 PubDigest 的实用性,我们重点研究了慢性血栓栓塞性肺动脉高压 (CTEPH),这是一种罕见疾病,发表的科学文献总体数量有限。结果 我们的文献分析通过应用术语频率-反向文档频率(TF-IDF)评分(一种衡量术语在文本语料库中重要性的技术)确定了与 CTEPH 相关的关键临床特征。这样我们就能绘制出相关疾病的地图。其中最突出的是静脉血栓 (VT),它与 CTEPH 有很强的语义联系。深入研究后,我们发现了 CTEPH 的潜在再利用候选词,通过基于大规模神经网络的文献上下文化和 CTEPH 与 VT 文献库的预测建模,我们找到了这两种疾病之间尚未认识到的新关联。除了抗血栓药物卡普珠单抗外,苯并呋喃衍生物也是一个引人入胜的发现。特别是,苯并呋喃衍生物胺碘酮在文献中显示出潜在的抗血栓特性。我们的体外测试证实,胺碘酮能够将血小板聚集率显著降低 68%(p = 0.02)。然而,实际临床数据表明,接受胺碘酮治疗的 CTEPH 患者面临的死亡风险显著增加了 15.9%(p<0.001)。结论 NLP为解读科学文献提供了一种创新方法,特别是在药物再利用方面,但将其与体外测试和真实世界证据等补充方法结合起来至关重要。我们在苯并呋喃衍生物和CTEPH方面的探索强调了这一点。因此,将NLP与实践实验和真实世界的临床数据相结合,可以为更快、更安全的药物再利用方法铺平道路,尤其是针对CTEPH等罕见疾病。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
An exploration into CTEPH medications: Combining natural language processing, embedding learning, in vitro models, and real-world evidence for drug repurposing
Background In the modern era, the growth of scientific literature presents a daunting challenge for researchers to keep informed of advancements across multiple disciplines. Objective We apply natural language processing (NLP) and embedding learning concepts to design PubDigest, a tool that combs PubMed literature, aiming to pinpoint potential drugs that could be repurposed. Methods Using NLP, especially term associations through word embeddings, we explored unrecognized relationships between drugs and diseases. To illustrate the utility of PubDigest, we focused on chronic thromboembolic pulmonary hypertension (CTEPH), a rare disease with an overall limited number of scientific publications. Results Our literature analysis identified key clinical features linked to CTEPH by applying term frequency-inverse document frequency (TF-IDF) scoring, a technique measuring a term’s significance in a text corpus. This allowed us to map related diseases. One standout was venous thrombosis (VT), which showed strong semantic links with CTEPH. Looking deeper, we discovered potential repurposing candidates for CTEPH through large-scale neural network-based contextualization of literature and predictive modeling on both the CTEPH and the VT literature corpora to find novel, yet unrecognized associations between the two diseases. Alongside the anti-thrombotic agent caplacizumab, benzofuran derivatives were an intriguing find. In particular, the benzofuran derivative amiodarone displayed potential anti-thrombotic properties in the literature. Our in vitro tests confirmed amiodarone’s ability to reduce platelet aggregation significantly by 68% (p = 0.02). However, real-world clinical data indicated that CTEPH patients receiving amiodarone treatment faced a significant 15.9% higher mortality risk (p<0.001). Conclusions While NLP offers an innovative approach to interpreting scientific literature, especially for drug repurposing, it is crucial to combine it with complementary methods like in vitro testing and real-world evidence. Our exploration with benzofuran derivatives and CTEPH underscores this point. Thus, blending NLP with hands-on experiments and real-world clinical data can pave the way for faster and safer drug repurposing approaches, especially for rare diseases like CTEPH.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
PLoS Computational Biology
PLoS Computational Biology BIOCHEMICAL RESEARCH METHODS-MATHEMATICAL & COMPUTATIONAL BIOLOGY
CiteScore
7.10
自引率
4.70%
发文量
820
审稿时长
2.5 months
期刊介绍: PLOS Computational Biology features works of exceptional significance that further our understanding of living systems at all scales—from molecules and cells, to patient populations and ecosystems—through the application of computational methods. Readers include life and computational scientists, who can take the important findings presented here to the next level of discovery. Research articles must be declared as belonging to a relevant section. More information about the sections can be found in the submission guidelines. Research articles should model aspects of biological systems, demonstrate both methodological and scientific novelty, and provide profound new biological insights. Generally, reliability and significance of biological discovery through computation should be validated and enriched by experimental studies. Inclusion of experimental validation is not required for publication, but should be referenced where possible. Inclusion of experimental validation of a modest biological discovery through computation does not render a manuscript suitable for PLOS Computational Biology. Research articles specifically designated as Methods papers should describe outstanding methods of exceptional importance that have been shown, or have the promise to provide new biological insights. The method must already be widely adopted, or have the promise of wide adoption by a broad community of users. Enhancements to existing published methods will only be considered if those enhancements bring exceptional new capabilities.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信