Mining drug-target interactions from biomedical literature using chemical and gene descriptions-based ensemble transformer model.

IF 2.4 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Bioinformatics advances Pub Date : 2024-07-22 eCollection Date: 2024-01-01 DOI:10.1093/bioadv/vbae106

Jehad Aldahdooh, Ziaurrehman Tanoli, Jing Tang

{"title":"Mining drug-target interactions from biomedical literature using chemical and gene descriptions-based ensemble transformer model.","authors":"Jehad Aldahdooh, Ziaurrehman Tanoli, Jing Tang","doi":"10.1093/bioadv/vbae106","DOIUrl":null,"url":null,"abstract":"Motivation: Drug-target interactions (DTIs) play a pivotal role in drug discovery, as it aims to identify potential drug targets and elucidate their mechanism of action. In recent years, the application of natural language processing (NLP), particularly when combined with pre-trained language models, has gained considerable momentum in the biomedical domain, with the potential to mine vast amounts of texts to facilitate the efficient extraction of DTIs from the literature.Results: In this article, we approach the task of DTIs as an entity-relationship extraction problem, utilizing different pre-trained transformer language models, such as BERT, to extract DTIs. Our results indicate that an ensemble approach, by combining gene descriptions from the Entrez Gene database with chemical descriptions from the Comparative Toxicogenomics Database (CTD), is critical for achieving optimal performance. The proposed model achieves an F1 score of 80.6 on the hidden DrugProt test set, which is the top-ranked performance among all the submitted models in the official evaluation. Furthermore, we conduct a comparative analysis to evaluate the effectiveness of various gene textual descriptions sourced from Entrez Gene and UniProt databases to gain insights into their impact on the performance. Our findings highlight the potential of NLP-based text mining using gene and chemical descriptions to improve drug-target extraction tasks.Availability and implementation: Datasets utilized in this study are accessible at https://dtis.drugtargetcommons.org/.","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae106"},"PeriodicalIF":2.4000,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11293871/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics advances","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioadv/vbae106","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Motivation: Drug-target interactions (DTIs) play a pivotal role in drug discovery, as it aims to identify potential drug targets and elucidate their mechanism of action. In recent years, the application of natural language processing (NLP), particularly when combined with pre-trained language models, has gained considerable momentum in the biomedical domain, with the potential to mine vast amounts of texts to facilitate the efficient extraction of DTIs from the literature.

Results: In this article, we approach the task of DTIs as an entity-relationship extraction problem, utilizing different pre-trained transformer language models, such as BERT, to extract DTIs. Our results indicate that an ensemble approach, by combining gene descriptions from the Entrez Gene database with chemical descriptions from the Comparative Toxicogenomics Database (CTD), is critical for achieving optimal performance. The proposed model achieves an F1 score of 80.6 on the hidden DrugProt test set, which is the top-ranked performance among all the submitted models in the official evaluation. Furthermore, we conduct a comparative analysis to evaluate the effectiveness of various gene textual descriptions sourced from Entrez Gene and UniProt databases to gain insights into their impact on the performance. Our findings highlight the potential of NLP-based text mining using gene and chemical descriptions to improve drug-target extraction tasks.

Availability and implementation: Datasets utilized in this study are accessible at https://dtis.drugtargetcommons.org/.

查看原文本刊更多论文

利用基于化学和基因描述的集合变换器模型从生物医学文献中挖掘药物与靶点的相互作用。

动机药物-靶点相互作用（DTIs）在药物发现中起着举足轻重的作用，因为它旨在确定潜在的药物靶点并阐明其作用机制。近年来，自然语言处理（NLP）的应用，尤其是与预先训练的语言模型相结合的应用，在生物医学领域获得了相当大的发展势头，有可能挖掘大量文本，促进从文献中有效提取 DTIs：在本文中，我们将 DTIs 任务视为实体关系提取问题，利用不同的预训练转换语言模型（如 BERT）来提取 DTIs。我们的研究结果表明，将 Entrez 基因数据库中的基因描述与比较毒物基因组学数据库（CTD）中的化学描述相结合的组合方法对于实现最佳性能至关重要。所提出的模型在隐藏的 DrugProt 测试集上取得了 80.6 的 F1 分数，在官方评估中所有提交的模型中名列前茅。此外，我们还进行了对比分析，以评估来自 Entrez Gene 和 UniProt 数据库的各种基因文本描述的有效性，从而深入了解它们对性能的影响。我们的研究结果凸显了利用基因和化学描述进行基于 NLP 的文本挖掘以改进药物靶标提取任务的潜力：本研究中使用的数据集可在 https://dtis.drugtargetcommons.org/ 上访问。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Bioinformatics advances

CiteScore

1.60

自引率

0.00%

发文量