Hua Zhang , Pengliang Chen , Xiaoqi Yang , Junhao Wang , Guogen Shan , Bi Chen , Bo Jiang
{"title":"Language model encoded multi-scale feature fusion and transformation for predicting protein-peptide binding sites","authors":"Hua Zhang , Pengliang Chen , Xiaoqi Yang , Junhao Wang , Guogen Shan , Bi Chen , Bo Jiang","doi":"10.1016/j.patcog.2025.112487","DOIUrl":null,"url":null,"abstract":"<div><div>Protein-peptide interactions serve as a pivotal and indispensable role in diverse biological functions and cellular processes. Although recent studies have begun to employ language models for predicting protein-peptide binding sites (PPBS), the majority of previous approaches have persisted in utilizing intricate sequence-based feature engineering or incorporating costly experimental structural information. To overcome these limitations, we develop a novel sequence-based end-to-end PPBS predictor using deep learning, named Language model encoded Multi-scale Feature Fusion and Transformation (LMFFT). The proposed model starts with a single protein language model for comprehensive multi-scale feature extraction, including residue, dipeptide, and fragment-level representations, which are implemented by the dipeptide embedding-based fragment fusion and further enhanced through the dipeptide contextual encoding. Moreover, multi-scale convolutional neural networks are applied to transform multi-scale features by capturing intricate interactions between local and global information. Our LMFFT achieves state-of-the-art performance across three benchmark datasets, outperforming existing sequence-based methods and demonstrating competitive advantages over certain structure-based baselines. This work provides a cost-effective and efficient solution for PPBS prediction, advancing revealing the sequence-function relationship of proteins.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112487"},"PeriodicalIF":7.6000,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320325011501","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Protein-peptide interactions serve as a pivotal and indispensable role in diverse biological functions and cellular processes. Although recent studies have begun to employ language models for predicting protein-peptide binding sites (PPBS), the majority of previous approaches have persisted in utilizing intricate sequence-based feature engineering or incorporating costly experimental structural information. To overcome these limitations, we develop a novel sequence-based end-to-end PPBS predictor using deep learning, named Language model encoded Multi-scale Feature Fusion and Transformation (LMFFT). The proposed model starts with a single protein language model for comprehensive multi-scale feature extraction, including residue, dipeptide, and fragment-level representations, which are implemented by the dipeptide embedding-based fragment fusion and further enhanced through the dipeptide contextual encoding. Moreover, multi-scale convolutional neural networks are applied to transform multi-scale features by capturing intricate interactions between local and global information. Our LMFFT achieves state-of-the-art performance across three benchmark datasets, outperforming existing sequence-based methods and demonstrating competitive advantages over certain structure-based baselines. This work provides a cost-effective and efficient solution for PPBS prediction, advancing revealing the sequence-function relationship of proteins.
期刊介绍:
The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.