基于预训练模型作为嵌入的输出和基于结构感知交叉注意的特征融合的可解释药物靶标亲和力预测。

IF 3.9 2区化学 Q2 CHEMISTRY, APPLIED

Molecular Diversity Pub Date : 2025-08-01 Epub Date: 2025-04-25 DOI:10.1007/s11030-025-11194-7

Fang Zheng, Juanjuan Zhao, Zihang Yuan, Yuanchen Gao, Yafeng Li, Yaheng Li, Yan Geng, Yan Qiang

{"title":"基于预训练模型作为嵌入的输出和基于结构感知交叉注意的特征融合的可解释药物靶标亲和力预测。","authors":"Fang Zheng, Juanjuan Zhao, Zihang Yuan, Yuanchen Gao, Yafeng Li, Yaheng Li, Yan Geng, Yan Qiang","doi":"10.1007/s11030-025-11194-7","DOIUrl":null,"url":null,"abstract":"The characteristics of protein pockets can better capture the interaction information between proteins and small molecules, thereby improving the performance of drug-target interaction (DTI) prediction tasks. However, pocket data typically need to be predicted using software such as AlphaFold, which would entail a massive workload for datasets ranging from tens of thousands to hundreds of thousands of samples. Moreover, feature representation networks for 3D pocket data are computationally intensive. To address this, we propose simulating 3D pocket data using sequence data through feature fusion of two different objects based on structure cross-attention (CASD). Additionally, precise feature representation is a prerequisite for accurately identifying pocket information. We introduce a method that leverages the output of the last layer of a pre-trained model as an embedding layer for training a new model from scratch. This approach not only incorporates prior knowledge from the pre-trained model but also expands model capacity, enabling more accurate feature representation. Furthermore, we enhance the multimodal representation of small molecule compounds using feature fusion based on structure cross-attention for the same object (CASS), further improving feature representation capabilities. Our cross-attention mechanisms operate at the token-level or node-level, allowing fine-grained capture of interactions between amino acids and atoms. This enables the identification of the contribution score of each atom or amino acid to the task, making our model interpretable for drug-target prediction. Experimental validation demonstrates that our model achieves state-of-the-art predictive performance.","PeriodicalId":708,"journal":{"name":"Molecular Diversity","volume":" ","pages":"3537-3554"},"PeriodicalIF":3.9000,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Interpretable drug-target affinity prediction based on pre-trained models' output as embeddings and based on structure-aware cross-attention for feature fusion.\",\"authors\":\"Fang Zheng, Juanjuan Zhao, Zihang Yuan, Yuanchen Gao, Yafeng Li, Yaheng Li, Yan Geng, Yan Qiang\",\"doi\":\"10.1007/s11030-025-11194-7\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The characteristics of protein pockets can better capture the interaction information between proteins and small molecules, thereby improving the performance of drug-target interaction (DTI) prediction tasks. However, pocket data typically need to be predicted using software such as AlphaFold, which would entail a massive workload for datasets ranging from tens of thousands to hundreds of thousands of samples. Moreover, feature representation networks for 3D pocket data are computationally intensive. To address this, we propose simulating 3D pocket data using sequence data through feature fusion of two different objects based on structure cross-attention (CASD). Additionally, precise feature representation is a prerequisite for accurately identifying pocket information. We introduce a method that leverages the output of the last layer of a pre-trained model as an embedding layer for training a new model from scratch. This approach not only incorporates prior knowledge from the pre-trained model but also expands model capacity, enabling more accurate feature representation. Furthermore, we enhance the multimodal representation of small molecule compounds using feature fusion based on structure cross-attention for the same object (CASS), further improving feature representation capabilities. Our cross-attention mechanisms operate at the token-level or node-level, allowing fine-grained capture of interactions between amino acids and atoms. This enables the identification of the contribution score of each atom or amino acid to the task, making our model interpretable for drug-target prediction. Experimental validation demonstrates that our model achieves state-of-the-art predictive performance.\",\"PeriodicalId\":708,\"journal\":{\"name\":\"Molecular Diversity\",\"volume\":\" \",\"pages\":\"3537-3554\"},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2025-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Molecular Diversity\",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://doi.org/10.1007/s11030-025-11194-7\",\"RegionNum\":2,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/4/25 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q2\",\"JCRName\":\"CHEMISTRY, APPLIED\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular Diversity","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1007/s11030-025-11194-7","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/4/25 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"CHEMISTRY, APPLIED","Score":null,"Total":0}

引用次数: 0

摘要

蛋白质口袋的特性可以更好地捕捉蛋白质与小分子之间的相互作用信息，从而提高药物-靶标相互作用（DTI）预测任务的性能。然而，口袋数据通常需要使用像AlphaFold这样的软件进行预测，这将需要大量的工作量来处理从数万到数十万个样本的数据集。此外，用于三维口袋数据的特征表示网络是计算密集型的。为了解决这个问题，我们提出了基于结构交叉注意（CASD）的两个不同对象的特征融合，利用序列数据模拟三维口袋数据。此外，精确的特征表示是准确识别口袋信息的先决条件。我们引入了一种方法，利用预训练模型的最后一层的输出作为从头开始训练新模型的嵌入层。该方法不仅结合了预训练模型的先验知识，而且扩展了模型容量，实现了更准确的特征表示。在此基础上，利用基于结构交叉注意的特征融合增强了小分子化合物的多模态表示，进一步提高了特征表示能力。我们的交叉注意机制在令牌级或节点级运行，允许细粒度捕获氨基酸和原子之间的相互作用。这样就可以确定每个原子或氨基酸对任务的贡献分数，使我们的模型可用于药物靶标预测。实验验证表明，我们的模型达到了最先进的预测性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Interpretable drug-target affinity prediction based on pre-trained models' output as embeddings and based on structure-aware cross-attention for feature fusion.

The characteristics of protein pockets can better capture the interaction information between proteins and small molecules, thereby improving the performance of drug-target interaction (DTI) prediction tasks. However, pocket data typically need to be predicted using software such as AlphaFold, which would entail a massive workload for datasets ranging from tens of thousands to hundreds of thousands of samples. Moreover, feature representation networks for 3D pocket data are computationally intensive. To address this, we propose simulating 3D pocket data using sequence data through feature fusion of two different objects based on structure cross-attention (CASD). Additionally, precise feature representation is a prerequisite for accurately identifying pocket information. We introduce a method that leverages the output of the last layer of a pre-trained model as an embedding layer for training a new model from scratch. This approach not only incorporates prior knowledge from the pre-trained model but also expands model capacity, enabling more accurate feature representation. Furthermore, we enhance the multimodal representation of small molecule compounds using feature fusion based on structure cross-attention for the same object (CASS), further improving feature representation capabilities. Our cross-attention mechanisms operate at the token-level or node-level, allowing fine-grained capture of interactions between amino acids and atoms. This enables the identification of the contribution score of each atom or amino acid to the task, making our model interpretable for drug-target prediction. Experimental validation demonstrates that our model achieves state-of-the-art predictive performance.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Molecular Diversity 化学-化学综合

CiteScore

7.30

自引率

7.90%

发文量

219

审稿时长

2.7 months

期刊介绍： Molecular Diversity is a new publication forum for the rapid publication of refereed papers dedicated to describing the development, application and theory of molecular diversity and combinatorial chemistry in basic and applied research and drug discovery. The journal publishes both short and full papers, perspectives, news and reviews dealing with all aspects of the generation of molecular diversity, application of diversity for screening against alternative targets of all types (biological, biophysical, technological), analysis of results obtained and their application in various scientific disciplines/approaches including: combinatorial chemistry and parallel synthesis; small molecule libraries; microwave synthesis; flow synthesis; fluorous synthesis; diversity oriented synthesis (DOS); nanoreactors; click chemistry; multiplex technologies; fragment- and ligand-based design; structure/function/SAR; computational chemistry and molecular design; chemoinformatics; screening techniques and screening interfaces; analytical and purification methods; robotics, automation and miniaturization; targeted libraries; display libraries; peptides and peptoids; proteins; oligonucleotides; carbohydrates; natural diversity; new methods of library formulation and deconvolution; directed evolution, origin of life and recombination; search techniques, landscapes, random chemistry and more;