基于生成采样技术和集成深度学习模型的蛋白质-肽相互作用区残基预测

IF 7.2 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Applied Soft Computing Pub Date : 2025-07-12 DOI:10.1016/j.asoc.2025.113603

Shima Shafiee , Abdolhossein Fathi , Ghazaleh Taherzadeh

{"title":"基于生成采样技术和集成深度学习模型的蛋白质-肽相互作用区残基预测","authors":"Shima Shafiee , Abdolhossein Fathi , Ghazaleh Taherzadeh","doi":"10.1016/j.asoc.2025.113603","DOIUrl":null,"url":null,"abstract":"<div><h3>Motivation</h3><div>Predicting protein-peptide interactions advances the understanding of drug design, protein biological functions, and cellular processes. Researchers have proposed various experimental and computational methods to identify interactions between proteins and peptides. However, traditional experimental approaches are laborious, time-consuming, and inefficient. Motivated by these challenges, a novel computational method is developed to detect protein-peptide interaction region residues from protein data, providing a complementary approach to experimental techniques.</div></div><div><h3>Method</h3><div>We designed a computational method for identifying protein-peptide interaction region residues, by incorporating a generative sampling technique with ensemble deep learning (DL) model using various features derived from protein sequences and structures. The proposed method relied on three pipelines: pre-processing, processing, and post-processing. The pre-processing pipeline converted the amino acid sequence into an image-like input representation to capture vital residue interactions. Also to overcome class imbalance challenge and non-binding over-predicting drawback, it employs a generative sampling technique for balancing the training data. Afterwards, to achieve more reliable prediction of protein-peptide interaction, a processing pipeline is designed that incorporates three independent DL sub-models. Subsequently, in the post-processing pipeline to obtain final prediction results, the outputs of ensemble DL modules are applied to three layers convolutional neural network.</div></div><div><h3>Results</h3><div>Compared to state-of-the-art sequence- and structure-based methods, the proposed method achieved the highest performance in F-measures (improved by 22.1 %), precision (improved by 3.9 %), and better balance between sensitivity and specificity. Eventually, our various experiments validated the effectiveness of the proposed method as a reliable computational assistant for predicting protein-peptide interaction region residues.</div></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":"182 ","pages":"Article 113603"},"PeriodicalIF":7.2000,"publicationDate":"2025-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Protein-peptide interaction region residues prediction using a generative sampling technique and ensemble deep learning-based models\",\"authors\":\"Shima Shafiee , Abdolhossein Fathi , Ghazaleh Taherzadeh\",\"doi\":\"10.1016/j.asoc.2025.113603\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Motivation</h3><div>Predicting protein-peptide interactions advances the understanding of drug design, protein biological functions, and cellular processes. Researchers have proposed various experimental and computational methods to identify interactions between proteins and peptides. However, traditional experimental approaches are laborious, time-consuming, and inefficient. Motivated by these challenges, a novel computational method is developed to detect protein-peptide interaction region residues from protein data, providing a complementary approach to experimental techniques.</div></div><div><h3>Method</h3><div>We designed a computational method for identifying protein-peptide interaction region residues, by incorporating a generative sampling technique with ensemble deep learning (DL) model using various features derived from protein sequences and structures. The proposed method relied on three pipelines: pre-processing, processing, and post-processing. The pre-processing pipeline converted the amino acid sequence into an image-like input representation to capture vital residue interactions. Also to overcome class imbalance challenge and non-binding over-predicting drawback, it employs a generative sampling technique for balancing the training data. Afterwards, to achieve more reliable prediction of protein-peptide interaction, a processing pipeline is designed that incorporates three independent DL sub-models. Subsequently, in the post-processing pipeline to obtain final prediction results, the outputs of ensemble DL modules are applied to three layers convolutional neural network.</div></div><div><h3>Results</h3><div>Compared to state-of-the-art sequence- and structure-based methods, the proposed method achieved the highest performance in F-measures (improved by 22.1 %), precision (improved by 3.9 %), and better balance between sensitivity and specificity. Eventually, our various experiments validated the effectiveness of the proposed method as a reliable computational assistant for predicting protein-peptide interaction region residues.</div></div>\",\"PeriodicalId\":50737,\"journal\":{\"name\":\"Applied Soft Computing\",\"volume\":\"182 \",\"pages\":\"Article 113603\"},\"PeriodicalIF\":7.2000,\"publicationDate\":\"2025-07-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Soft Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1568494625009147\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Soft Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1568494625009147","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

预测蛋白质-肽相互作用促进了对药物设计、蛋白质生物学功能和细胞过程的理解。研究人员提出了各种实验和计算方法来确定蛋白质和肽之间的相互作用。然而，传统的实验方法费力、耗时且效率低下。在这些挑战的激励下，开发了一种新的计算方法来从蛋白质数据中检测蛋白质-肽相互作用区域残基，为实验技术提供了一种补充方法。方法利用蛋白质序列和结构的各种特征，结合生成采样技术和集成深度学习（DL）模型，设计了一种识别蛋白质-肽相互作用区残基的计算方法。该方法依赖于三个管道：预处理、处理和后处理。预处理管道将氨基酸序列转换为类似图像的输入表示，以捕获重要的残基相互作用。同时，为了克服类不平衡的挑战和非约束性过度预测的缺点，该算法采用了生成抽样技术来平衡训练数据。然后，为了更可靠地预测蛋白-肽相互作用，设计了一个包含三个独立DL子模型的处理流水线。随后，在后处理管道中，将集成DL模块的输出应用到三层卷积神经网络中，以获得最终的预测结果。结果与目前最先进的基于序列和结构的方法相比，该方法在F-measures（提高22.1% %）、精密度（提高3.9 %）和敏感性和特异性之间的更好平衡方面取得了最高的性能。最后，我们的各种实验验证了所提出的方法作为预测蛋白质-肽相互作用区残基的可靠计算助手的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Protein-peptide interaction region residues prediction using a generative sampling technique and ensemble deep learning-based models

Motivation

Predicting protein-peptide interactions advances the understanding of drug design, protein biological functions, and cellular processes. Researchers have proposed various experimental and computational methods to identify interactions between proteins and peptides. However, traditional experimental approaches are laborious, time-consuming, and inefficient. Motivated by these challenges, a novel computational method is developed to detect protein-peptide interaction region residues from protein data, providing a complementary approach to experimental techniques.

Method

We designed a computational method for identifying protein-peptide interaction region residues, by incorporating a generative sampling technique with ensemble deep learning (DL) model using various features derived from protein sequences and structures. The proposed method relied on three pipelines: pre-processing, processing, and post-processing. The pre-processing pipeline converted the amino acid sequence into an image-like input representation to capture vital residue interactions. Also to overcome class imbalance challenge and non-binding over-predicting drawback, it employs a generative sampling technique for balancing the training data. Afterwards, to achieve more reliable prediction of protein-peptide interaction, a processing pipeline is designed that incorporates three independent DL sub-models. Subsequently, in the post-processing pipeline to obtain final prediction results, the outputs of ensemble DL modules are applied to three layers convolutional neural network.

Results

Compared to state-of-the-art sequence- and structure-based methods, the proposed method achieved the highest performance in F-measures (improved by 22.1 %), precision (improved by 3.9 %), and better balance between sensitivity and specificity. Eventually, our various experiments validated the effectiveness of the proposed method as a reliable computational assistant for predicting protein-peptide interaction region residues.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Applied Soft Computing 工程技术-计算机：跨学科应用

CiteScore

15.80

自引率

6.90%

发文量

874

审稿时长

10.9 months

期刊介绍： Applied Soft Computing is an international journal promoting an integrated view of soft computing to solve real life problems.The focus is to publish the highest quality research in application and convergence of the areas of Fuzzy Logic, Neural Networks, Evolutionary Computing, Rough Sets and other similar techniques to address real world complexities. Applied Soft Computing is a rolling publication: articles are published as soon as the editor-in-chief has accepted them. Therefore, the web site will continuously be updated with new articles and the publication time will be short.