{"title":"Identification of Multi-functional Therapeutic Peptides Based on Prototypical Supervised Contrastive Learning.","authors":"Sitong Niu, Henghui Fan, Fei Wang, Xiaomei Yang, Junfeng Xia","doi":"10.1007/s12539-024-00674-3","DOIUrl":null,"url":null,"abstract":"<p><p>High-throughput sequencing has exponentially increased peptide sequences, necessitating a computational method to identify multi-functional therapeutic peptides (MFTP) from their sequences. However, existing computational methods are challenged by class imbalance, particularly in learning effective sequence representations. To address this, we propose PSCFA, a prototypical supervised contrastive learning with a feature augmentation method for MFTP prediction. We employ a two-stage training scheme to train the feature extractor and the classifier respectively, underpinned by the principle that better feature representation boosts classification accuracy. In the first stage, we utilize a prototypical supervised contrastive learning strategy to enhance the uniformity of feature space distribution, ensuring that the characteristics of samples within the same category are tightly clustered while those from different categories are more dispersed. In the second stage, a feature augmentation strategy that focuses on infrequent labels (tail labels) is used to refine the learning process of the classifier. We use a prototype-based variational autoencoder to capture semantic links among common labels (head labels) and their prototypes. This knowledge is then transferred to tail labels, generating enhanced features for classifier training. The experiments prove that the PSCFA method significantly outperforms existing methods for MFTP prediction, making a significant advancement in therapeutic peptide identification.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9000,"publicationDate":"2024-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Interdisciplinary Sciences: Computational Life Sciences","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1007/s12539-024-00674-3","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
High-throughput sequencing has exponentially increased peptide sequences, necessitating a computational method to identify multi-functional therapeutic peptides (MFTP) from their sequences. However, existing computational methods are challenged by class imbalance, particularly in learning effective sequence representations. To address this, we propose PSCFA, a prototypical supervised contrastive learning with a feature augmentation method for MFTP prediction. We employ a two-stage training scheme to train the feature extractor and the classifier respectively, underpinned by the principle that better feature representation boosts classification accuracy. In the first stage, we utilize a prototypical supervised contrastive learning strategy to enhance the uniformity of feature space distribution, ensuring that the characteristics of samples within the same category are tightly clustered while those from different categories are more dispersed. In the second stage, a feature augmentation strategy that focuses on infrequent labels (tail labels) is used to refine the learning process of the classifier. We use a prototype-based variational autoencoder to capture semantic links among common labels (head labels) and their prototypes. This knowledge is then transferred to tail labels, generating enhanced features for classifier training. The experiments prove that the PSCFA method significantly outperforms existing methods for MFTP prediction, making a significant advancement in therapeutic peptide identification.
期刊介绍:
Interdisciplinary Sciences--Computational Life Sciences aims to cover the most recent and outstanding developments in interdisciplinary areas of sciences, especially focusing on computational life sciences, an area that is enjoying rapid development at the forefront of scientific research and technology.
The journal publishes original papers of significant general interest covering recent research and developments. Articles will be published rapidly by taking full advantage of internet technology for online submission and peer-reviewing of manuscripts, and then by publishing OnlineFirstTM through SpringerLink even before the issue is built or sent to the printer.
The editorial board consists of many leading scientists with international reputation, among others, Luc Montagnier (UNESCO, France), Dennis Salahub (University of Calgary, Canada), Weitao Yang (Duke University, USA). Prof. Dongqing Wei at the Shanghai Jiatong University is appointed as the editor-in-chief; he made important contributions in bioinformatics and computational physics and is best known for his ground-breaking works on the theory of ferroelectric liquids. With the help from a team of associate editors and the editorial board, an international journal with sound reputation shall be created.