{"title":"基于生成采样技术和集成深度学习模型的蛋白质-肽相互作用区残基预测","authors":"Shima Shafiee , Abdolhossein Fathi , Ghazaleh Taherzadeh","doi":"10.1016/j.asoc.2025.113603","DOIUrl":null,"url":null,"abstract":"<div><h3>Motivation</h3><div>Predicting protein-peptide interactions advances the understanding of drug design, protein biological functions, and cellular processes. Researchers have proposed various experimental and computational methods to identify interactions between proteins and peptides. However, traditional experimental approaches are laborious, time-consuming, and inefficient. Motivated by these challenges, a novel computational method is developed to detect protein-peptide interaction region residues from protein data, providing a complementary approach to experimental techniques.</div></div><div><h3>Method</h3><div>We designed a computational method for identifying protein-peptide interaction region residues, by incorporating a generative sampling technique with ensemble deep learning (DL) model using various features derived from protein sequences and structures. The proposed method relied on three pipelines: pre-processing, processing, and post-processing. The pre-processing pipeline converted the amino acid sequence into an image-like input representation to capture vital residue interactions. Also to overcome class imbalance challenge and non-binding over-predicting drawback, it employs a generative sampling technique for balancing the training data. Afterwards, to achieve more reliable prediction of protein-peptide interaction, a processing pipeline is designed that incorporates three independent DL sub-models. Subsequently, in the post-processing pipeline to obtain final prediction results, the outputs of ensemble DL modules are applied to three layers convolutional neural network.</div></div><div><h3>Results</h3><div>Compared to state-of-the-art sequence- and structure-based methods, the proposed method achieved the highest performance in F-measures (improved by 22.1 %), precision (improved by 3.9 %), and better balance between sensitivity and specificity. Eventually, our various experiments validated the effectiveness of the proposed method as a reliable computational assistant for predicting protein-peptide interaction region residues.</div></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":"182 ","pages":"Article 113603"},"PeriodicalIF":7.2000,"publicationDate":"2025-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Protein-peptide interaction region residues prediction using a generative sampling technique and ensemble deep learning-based models\",\"authors\":\"Shima Shafiee , Abdolhossein Fathi , Ghazaleh Taherzadeh\",\"doi\":\"10.1016/j.asoc.2025.113603\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Motivation</h3><div>Predicting protein-peptide interactions advances the understanding of drug design, protein biological functions, and cellular processes. Researchers have proposed various experimental and computational methods to identify interactions between proteins and peptides. However, traditional experimental approaches are laborious, time-consuming, and inefficient. Motivated by these challenges, a novel computational method is developed to detect protein-peptide interaction region residues from protein data, providing a complementary approach to experimental techniques.</div></div><div><h3>Method</h3><div>We designed a computational method for identifying protein-peptide interaction region residues, by incorporating a generative sampling technique with ensemble deep learning (DL) model using various features derived from protein sequences and structures. The proposed method relied on three pipelines: pre-processing, processing, and post-processing. The pre-processing pipeline converted the amino acid sequence into an image-like input representation to capture vital residue interactions. Also to overcome class imbalance challenge and non-binding over-predicting drawback, it employs a generative sampling technique for balancing the training data. Afterwards, to achieve more reliable prediction of protein-peptide interaction, a processing pipeline is designed that incorporates three independent DL sub-models. Subsequently, in the post-processing pipeline to obtain final prediction results, the outputs of ensemble DL modules are applied to three layers convolutional neural network.</div></div><div><h3>Results</h3><div>Compared to state-of-the-art sequence- and structure-based methods, the proposed method achieved the highest performance in F-measures (improved by 22.1 %), precision (improved by 3.9 %), and better balance between sensitivity and specificity. Eventually, our various experiments validated the effectiveness of the proposed method as a reliable computational assistant for predicting protein-peptide interaction region residues.</div></div>\",\"PeriodicalId\":50737,\"journal\":{\"name\":\"Applied Soft Computing\",\"volume\":\"182 \",\"pages\":\"Article 113603\"},\"PeriodicalIF\":7.2000,\"publicationDate\":\"2025-07-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Soft Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1568494625009147\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Soft Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1568494625009147","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Protein-peptide interaction region residues prediction using a generative sampling technique and ensemble deep learning-based models
Motivation
Predicting protein-peptide interactions advances the understanding of drug design, protein biological functions, and cellular processes. Researchers have proposed various experimental and computational methods to identify interactions between proteins and peptides. However, traditional experimental approaches are laborious, time-consuming, and inefficient. Motivated by these challenges, a novel computational method is developed to detect protein-peptide interaction region residues from protein data, providing a complementary approach to experimental techniques.
Method
We designed a computational method for identifying protein-peptide interaction region residues, by incorporating a generative sampling technique with ensemble deep learning (DL) model using various features derived from protein sequences and structures. The proposed method relied on three pipelines: pre-processing, processing, and post-processing. The pre-processing pipeline converted the amino acid sequence into an image-like input representation to capture vital residue interactions. Also to overcome class imbalance challenge and non-binding over-predicting drawback, it employs a generative sampling technique for balancing the training data. Afterwards, to achieve more reliable prediction of protein-peptide interaction, a processing pipeline is designed that incorporates three independent DL sub-models. Subsequently, in the post-processing pipeline to obtain final prediction results, the outputs of ensemble DL modules are applied to three layers convolutional neural network.
Results
Compared to state-of-the-art sequence- and structure-based methods, the proposed method achieved the highest performance in F-measures (improved by 22.1 %), precision (improved by 3.9 %), and better balance between sensitivity and specificity. Eventually, our various experiments validated the effectiveness of the proposed method as a reliable computational assistant for predicting protein-peptide interaction region residues.
期刊介绍:
Applied Soft Computing is an international journal promoting an integrated view of soft computing to solve real life problems.The focus is to publish the highest quality research in application and convergence of the areas of Fuzzy Logic, Neural Networks, Evolutionary Computing, Rough Sets and other similar techniques to address real world complexities.
Applied Soft Computing is a rolling publication: articles are published as soon as the editor-in-chief has accepted them. Therefore, the web site will continuously be updated with new articles and the publication time will be short.