Exploiting Pre-trained Models for Drug Target Affinity Prediction with Nearest Neighbors

arXiv - QuanBio - Biomolecules Pub Date : 2024-07-21 DOI:arxiv-2407.15202

Qizhi Pei, Lijun Wu, Zhenyu He, Jinhua Zhu, Yingce Xia, Shufang Xie, Rui Yan

{"title":"Exploiting Pre-trained Models for Drug Target Affinity Prediction with Nearest Neighbors","authors":"Qizhi Pei, Lijun Wu, Zhenyu He, Jinhua Zhu, Yingce Xia, Shufang Xie, Rui Yan","doi":"arxiv-2407.15202","DOIUrl":null,"url":null,"abstract":"Drug-Target binding Affinity (DTA) prediction is essential for drug\ndiscovery. Despite the application of deep learning methods to DTA prediction,\nthe achieved accuracy remain suboptimal. In this work, inspired by the recent\nsuccess of retrieval methods, we propose $k$NN-DTA, a non-parametric\nembedding-based retrieval method adopted on a pre-trained DTA prediction model,\nwhich can extend the power of the DTA model with no or negligible cost.\nDifferent from existing methods, we introduce two neighbor aggregation ways\nfrom both embedding space and label space that are integrated into a unified\nframework. Specifically, we propose a \\emph{label aggregation} with\n\\emph{pair-wise retrieval} and a \\emph{representation aggregation} with\n\\emph{point-wise retrieval} of the nearest neighbors. This method executes in\nthe inference phase and can efficiently boost the DTA prediction performance\nwith no training cost. In addition, we propose an extension, Ada-$k$NN-DTA, an\ninstance-wise and adaptive aggregation with lightweight learning. Results on\nfour benchmark datasets show that $k$NN-DTA brings significant improvements,\noutperforming previous state-of-the-art (SOTA) results, e.g, on BindingDB\nIC$_{50}$ and $K_i$ testbeds, $k$NN-DTA obtains new records of RMSE\n$\\bf{0.684}$ and $\\bf{0.750}$. The extended Ada-$k$NN-DTA further improves the\nperformance to be $\\bf{0.675}$ and $\\bf{0.735}$ RMSE. These results strongly\nprove the effectiveness of our method. Results in other settings and\ncomprehensive studies/analyses also show the great potential of our $k$NN-DTA\napproach.","PeriodicalId":501022,"journal":{"name":"arXiv - QuanBio - Biomolecules","volume":"8 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Biomolecules","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.15202","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Drug-Target binding Affinity (DTA) prediction is essential for drug discovery. Despite the application of deep learning methods to DTA prediction, the achieved accuracy remain suboptimal. In this work, inspired by the recent success of retrieval methods, we propose $k$NN-DTA, a non-parametric embedding-based retrieval method adopted on a pre-trained DTA prediction model, which can extend the power of the DTA model with no or negligible cost. Different from existing methods, we introduce two neighbor aggregation ways from both embedding space and label space that are integrated into a unified framework. Specifically, we propose a \emph{label aggregation} with \emph{pair-wise retrieval} and a \emph{representation aggregation} with \emph{point-wise retrieval} of the nearest neighbors. This method executes in the inference phase and can efficiently boost the DTA prediction performance with no training cost. In addition, we propose an extension, Ada-$k$NN-DTA, an instance-wise and adaptive aggregation with lightweight learning. Results on four benchmark datasets show that $k$NN-DTA brings significant improvements, outperforming previous state-of-the-art (SOTA) results, e.g, on BindingDB IC$_{50}$ and $K_i$ testbeds, $k$NN-DTA obtains new records of RMSE $\bf{0.684}$ and $\bf{0.750}$. The extended Ada-$k$NN-DTA further improves the performance to be $\bf{0.675}$ and $\bf{0.735}$ RMSE. These results strongly prove the effectiveness of our method. Results in other settings and comprehensive studies/analyses also show the great potential of our $k$NN-DTA approach.

查看原文本刊更多论文

利用预训练模型进行药物靶点亲和性近邻预测

药物靶标结合亲和力（DTA）预测对于药物发现至关重要。尽管深度学习方法已被应用到 DTA 预测中，但所达到的准确率仍不理想。与现有方法不同，我们从嵌入空间和标签空间引入了两种邻居聚合方法，并将其整合到一个统一的框架中。具体来说，我们提出了一种 "标签聚合"（emph{label aggregation} with\emph{pair-wise retrieval}）和一种 "表示聚合"（emph{representation aggregation} with\emph{point-wise retrieval}）的近邻检索方法。该方法在推理阶段执行，无需训练成本即可有效提高 DTA 预测性能。此外，我们还提出了一种扩展方法--Ada-$k$NN-DTA，这是一种具有轻量级学习功能的实例明智自适应聚合方法。在四个基准数据集上的结果表明，$k$NN-DTA带来了显著的改进，超越了之前最先进的（SOTA）结果，例如，在BindingDBIC$_{50}$和$K_i$测试平台上，$k$NN-DTA获得了RMSE$/bf{0.684}$和$/bf{0.750}$的新记录。扩展的Ada-k$NN-DTA进一步提高了性能，RMSE分别为$\bf{0.675}$和$\bf{0.735}$。这些结果有力地证明了我们方法的有效性。在其他环境和综合研究/分析中的结果也显示了我们的$k$NN-DTA方法的巨大潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - QuanBio - Biomolecules

自引率

0.00%

发文量