Fang Zheng, Juanjuan Zhao, Zihang Yuan, Yuanchen Gao, Yafeng Li, Yaheng Li, Yan Geng, Yan Qiang
{"title":"基于预训练模型作为嵌入的输出和基于结构感知交叉注意的特征融合的可解释药物靶标亲和力预测。","authors":"Fang Zheng, Juanjuan Zhao, Zihang Yuan, Yuanchen Gao, Yafeng Li, Yaheng Li, Yan Geng, Yan Qiang","doi":"10.1007/s11030-025-11194-7","DOIUrl":null,"url":null,"abstract":"<p><p>The characteristics of protein pockets can better capture the interaction information between proteins and small molecules, thereby improving the performance of drug-target interaction (DTI) prediction tasks. However, pocket data typically need to be predicted using software such as AlphaFold, which would entail a massive workload for datasets ranging from tens of thousands to hundreds of thousands of samples. Moreover, feature representation networks for 3D pocket data are computationally intensive. To address this, we propose simulating 3D pocket data using sequence data through feature fusion of two different objects based on structure cross-attention (CASD). Additionally, precise feature representation is a prerequisite for accurately identifying pocket information. We introduce a method that leverages the output of the last layer of a pre-trained model as an embedding layer for training a new model from scratch. This approach not only incorporates prior knowledge from the pre-trained model but also expands model capacity, enabling more accurate feature representation. Furthermore, we enhance the multimodal representation of small molecule compounds using feature fusion based on structure cross-attention for the same object (CASS), further improving feature representation capabilities. Our cross-attention mechanisms operate at the token-level or node-level, allowing fine-grained capture of interactions between amino acids and atoms. This enables the identification of the contribution score of each atom or amino acid to the task, making our model interpretable for drug-target prediction. Experimental validation demonstrates that our model achieves state-of-the-art predictive performance.</p>","PeriodicalId":708,"journal":{"name":"Molecular Diversity","volume":" ","pages":""},"PeriodicalIF":3.9000,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Interpretable drug-target affinity prediction based on pre-trained models' output as embeddings and based on structure-aware cross-attention for feature fusion.\",\"authors\":\"Fang Zheng, Juanjuan Zhao, Zihang Yuan, Yuanchen Gao, Yafeng Li, Yaheng Li, Yan Geng, Yan Qiang\",\"doi\":\"10.1007/s11030-025-11194-7\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>The characteristics of protein pockets can better capture the interaction information between proteins and small molecules, thereby improving the performance of drug-target interaction (DTI) prediction tasks. However, pocket data typically need to be predicted using software such as AlphaFold, which would entail a massive workload for datasets ranging from tens of thousands to hundreds of thousands of samples. Moreover, feature representation networks for 3D pocket data are computationally intensive. To address this, we propose simulating 3D pocket data using sequence data through feature fusion of two different objects based on structure cross-attention (CASD). Additionally, precise feature representation is a prerequisite for accurately identifying pocket information. We introduce a method that leverages the output of the last layer of a pre-trained model as an embedding layer for training a new model from scratch. This approach not only incorporates prior knowledge from the pre-trained model but also expands model capacity, enabling more accurate feature representation. Furthermore, we enhance the multimodal representation of small molecule compounds using feature fusion based on structure cross-attention for the same object (CASS), further improving feature representation capabilities. Our cross-attention mechanisms operate at the token-level or node-level, allowing fine-grained capture of interactions between amino acids and atoms. This enables the identification of the contribution score of each atom or amino acid to the task, making our model interpretable for drug-target prediction. Experimental validation demonstrates that our model achieves state-of-the-art predictive performance.</p>\",\"PeriodicalId\":708,\"journal\":{\"name\":\"Molecular Diversity\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2025-04-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Molecular Diversity\",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://doi.org/10.1007/s11030-025-11194-7\",\"RegionNum\":2,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"CHEMISTRY, APPLIED\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular Diversity","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1007/s11030-025-11194-7","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, APPLIED","Score":null,"Total":0}
Interpretable drug-target affinity prediction based on pre-trained models' output as embeddings and based on structure-aware cross-attention for feature fusion.
The characteristics of protein pockets can better capture the interaction information between proteins and small molecules, thereby improving the performance of drug-target interaction (DTI) prediction tasks. However, pocket data typically need to be predicted using software such as AlphaFold, which would entail a massive workload for datasets ranging from tens of thousands to hundreds of thousands of samples. Moreover, feature representation networks for 3D pocket data are computationally intensive. To address this, we propose simulating 3D pocket data using sequence data through feature fusion of two different objects based on structure cross-attention (CASD). Additionally, precise feature representation is a prerequisite for accurately identifying pocket information. We introduce a method that leverages the output of the last layer of a pre-trained model as an embedding layer for training a new model from scratch. This approach not only incorporates prior knowledge from the pre-trained model but also expands model capacity, enabling more accurate feature representation. Furthermore, we enhance the multimodal representation of small molecule compounds using feature fusion based on structure cross-attention for the same object (CASS), further improving feature representation capabilities. Our cross-attention mechanisms operate at the token-level or node-level, allowing fine-grained capture of interactions between amino acids and atoms. This enables the identification of the contribution score of each atom or amino acid to the task, making our model interpretable for drug-target prediction. Experimental validation demonstrates that our model achieves state-of-the-art predictive performance.
期刊介绍:
Molecular Diversity is a new publication forum for the rapid publication of refereed papers dedicated to describing the development, application and theory of molecular diversity and combinatorial chemistry in basic and applied research and drug discovery. The journal publishes both short and full papers, perspectives, news and reviews dealing with all aspects of the generation of molecular diversity, application of diversity for screening against alternative targets of all types (biological, biophysical, technological), analysis of results obtained and their application in various scientific disciplines/approaches including:
combinatorial chemistry and parallel synthesis;
small molecule libraries;
microwave synthesis;
flow synthesis;
fluorous synthesis;
diversity oriented synthesis (DOS);
nanoreactors;
click chemistry;
multiplex technologies;
fragment- and ligand-based design;
structure/function/SAR;
computational chemistry and molecular design;
chemoinformatics;
screening techniques and screening interfaces;
analytical and purification methods;
robotics, automation and miniaturization;
targeted libraries;
display libraries;
peptides and peptoids;
proteins;
oligonucleotides;
carbohydrates;
natural diversity;
new methods of library formulation and deconvolution;
directed evolution, origin of life and recombination;
search techniques, landscapes, random chemistry and more;