Protocol for: A Simple, Accessible, Literature-based Drug Repurposing Pipeline

medRxiv - Health Informatics Pub Date : 2024-07-19 DOI:10.1101/2024.07.18.24310641

Maximin Lange, Eoin Gogarty, Meredith Martyn, Philip Braude, Feras Fayez, Ben Carter

{"title":"Protocol for: A Simple, Accessible, Literature-based Drug Repurposing Pipeline","authors":"Maximin Lange, Eoin Gogarty, Meredith Martyn, Philip Braude, Feras Fayez, Ben Carter","doi":"10.1101/2024.07.18.24310641","DOIUrl":null,"url":null,"abstract":"We will develop a novel approach to drug repurposing, utilising Natural Language Processing (NLP) and Literature Based Discovery (LBD) techniques. This will present a simplified, accessible drug repurposing pipeline using Word2Vec embeddings trained on PubMed abstracts to identify potential new medications to be repurposed. We present this approach in the context of antipsychotics, but it could be repeated for any available medication. The research is structured in three stages:\n1. Identification of candidate medications using Word2Vec algorithm trained on scientific literature.\n2. Empirical testing of identified candidates using a large hospital dataset to explore protective effects against disease onset.\n3. Validation of findings using a second, independent dataset to assess generalizability. This method addresses limitations in current machine learning-based drug repurposing approaches, including lack of external validation and limited accessibility. By leveraging Word2Vec's ability to capture semantic relationships between words, the study aims to uncover hidden connections in medical literature that may lead to novel therapeutic discoveries. The protocol emphasizes transparency and reproducibility, utilizing publicly available electronic health record (EHR) databases for validation. This approach allows for tangible results even for researchers with limited machine learning expertise, bridging the gap between biomedical and information systems communities.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"35 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"medRxiv - Health Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.07.18.24310641","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

We will develop a novel approach to drug repurposing, utilising Natural Language Processing (NLP) and Literature Based Discovery (LBD) techniques. This will present a simplified, accessible drug repurposing pipeline using Word2Vec embeddings trained on PubMed abstracts to identify potential new medications to be repurposed. We present this approach in the context of antipsychotics, but it could be repeated for any available medication. The research is structured in three stages: 1. Identification of candidate medications using Word2Vec algorithm trained on scientific literature. 2. Empirical testing of identified candidates using a large hospital dataset to explore protective effects against disease onset. 3. Validation of findings using a second, independent dataset to assess generalizability. This method addresses limitations in current machine learning-based drug repurposing approaches, including lack of external validation and limited accessibility. By leveraging Word2Vec's ability to capture semantic relationships between words, the study aims to uncover hidden connections in medical literature that may lead to novel therapeutic discoveries. The protocol emphasizes transparency and reproducibility, utilizing publicly available electronic health record (EHR) databases for validation. This approach allows for tangible results even for researchers with limited machine learning expertise, bridging the gap between biomedical and information systems communities.

查看原文本刊更多论文

协议：基于文献的简单、易用的药物再利用管道

我们将利用自然语言处理（NLP）和基于文献的发现（LBD）技术，开发一种新的药物再利用方法。这将提供一个简化的、可访问的药物再利用管道，使用在PubMed摘要上训练的Word2Vec嵌入来识别潜在的新药再利用。我们以抗精神病药物为背景介绍了这种方法，但任何现有药物都可以重复使用这种方法。研究分为三个阶段：1.使用在科学文献上训练的 Word2Vec 算法识别候选药物；2.使用大型医院数据集对识别出的候选药物进行经验测试，以探索其对疾病发作的保护作用；3.使用第二个独立数据集对研究结果进行验证，以评估其通用性。这种方法解决了目前基于机器学习的药物再利用方法的局限性，包括缺乏外部验证和可及性有限。通过利用 Word2Vec 捕捉词与词之间语义关系的能力，该研究旨在发现医学文献中隐藏的联系，从而发现新的治疗方法。该方案强调透明度和可重复性，利用公开的电子健康记录（EHR）数据库进行验证。即使是机器学习专业知识有限的研究人员也能通过这种方法获得切实的成果，从而缩小生物医学和信息系统界之间的差距。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

medRxiv - Health Informatics

自引率

0.00%

发文量