{"title":"基于多特征融合和暹罗网络的神经肽预测模型NeuroPpred-MSN。","authors":"Jian Wen, Minyu Chen, Yongqi Shen, Honghong Wang, Zhuoyu Wei, Lichuan Gu, Xiaolei Zhu","doi":"10.1007/s12539-025-00730-6","DOIUrl":null,"url":null,"abstract":"<p><p>The discovery of neuropeptides offers numerous opportunities for identifying novel drugs and targets to treat a variety of diseases. While various computational methods have been proposed, there remains potential for further performance improvement. In this work, we introduce NeuroPpred-MSN, an innovative and efficient neuropeptide prediction model that leverages multi-feature fusion and Siamese networks. To comprehensively represent the information of neuropeptides, the peptide sequences are encoded by four encoding schemes (token embedding, word2vec embedding, protein language embedding, and handcrafted features). Then, the token embedding and word2vector embedding are fed to a Siamese network channel. In the other channel of the model, peptide sequences and their secondary structure sequences are fed into ProtT5-XL-UniRef50 model to generate the embedding features, while handcrafted encoding techniques are used to extract the physicochemical information. Then the two kinds of features are fused and fed into a bidirectional gated recurrent unit (Bi-GRU) network for further processing. Ultimately, the outputs of the two channels are integrated into a fully connected layer, thereby facilitating the generation of the final prediction. The results on the independent test set indicate that NeuroPpred-MSN exhibits superior predictive performance, with an area under the receiver operating characteristic curve (AUROC) of 98.3%, exceeding the performance of other state-of-the-art predictors. Specifically, compared to other optimal results, this model exhibits improvements of 1.52% in accuracy (ACC), 1.52% in F1 score (F1), 3.2% in Matthews correlation coefficient (MCC), and 1.55% in AUROC. The model was further evaluated on imbalanced datasets, where it achieved the highest values in AUROC, ACC, MCC, sensitivity (SN), and F1, further demonstrating its robustness and generalization. The model can be accessed at the following GitHub repository: https://github.com/wenjean/NeuroPpred-MSN .</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9000,"publicationDate":"2025-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"NeuroPpred-MSN: A Neuropeptide Prediction Model Based on Multi-feature Fusion and Siamese Networks.\",\"authors\":\"Jian Wen, Minyu Chen, Yongqi Shen, Honghong Wang, Zhuoyu Wei, Lichuan Gu, Xiaolei Zhu\",\"doi\":\"10.1007/s12539-025-00730-6\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>The discovery of neuropeptides offers numerous opportunities for identifying novel drugs and targets to treat a variety of diseases. While various computational methods have been proposed, there remains potential for further performance improvement. In this work, we introduce NeuroPpred-MSN, an innovative and efficient neuropeptide prediction model that leverages multi-feature fusion and Siamese networks. To comprehensively represent the information of neuropeptides, the peptide sequences are encoded by four encoding schemes (token embedding, word2vec embedding, protein language embedding, and handcrafted features). Then, the token embedding and word2vector embedding are fed to a Siamese network channel. In the other channel of the model, peptide sequences and their secondary structure sequences are fed into ProtT5-XL-UniRef50 model to generate the embedding features, while handcrafted encoding techniques are used to extract the physicochemical information. Then the two kinds of features are fused and fed into a bidirectional gated recurrent unit (Bi-GRU) network for further processing. Ultimately, the outputs of the two channels are integrated into a fully connected layer, thereby facilitating the generation of the final prediction. The results on the independent test set indicate that NeuroPpred-MSN exhibits superior predictive performance, with an area under the receiver operating characteristic curve (AUROC) of 98.3%, exceeding the performance of other state-of-the-art predictors. Specifically, compared to other optimal results, this model exhibits improvements of 1.52% in accuracy (ACC), 1.52% in F1 score (F1), 3.2% in Matthews correlation coefficient (MCC), and 1.55% in AUROC. The model was further evaluated on imbalanced datasets, where it achieved the highest values in AUROC, ACC, MCC, sensitivity (SN), and F1, further demonstrating its robustness and generalization. The model can be accessed at the following GitHub repository: https://github.com/wenjean/NeuroPpred-MSN .</p>\",\"PeriodicalId\":13670,\"journal\":{\"name\":\"Interdisciplinary Sciences: Computational Life Sciences\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2025-06-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Interdisciplinary Sciences: Computational Life Sciences\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1007/s12539-025-00730-6\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Interdisciplinary Sciences: Computational Life Sciences","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1007/s12539-025-00730-6","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
NeuroPpred-MSN: A Neuropeptide Prediction Model Based on Multi-feature Fusion and Siamese Networks.
The discovery of neuropeptides offers numerous opportunities for identifying novel drugs and targets to treat a variety of diseases. While various computational methods have been proposed, there remains potential for further performance improvement. In this work, we introduce NeuroPpred-MSN, an innovative and efficient neuropeptide prediction model that leverages multi-feature fusion and Siamese networks. To comprehensively represent the information of neuropeptides, the peptide sequences are encoded by four encoding schemes (token embedding, word2vec embedding, protein language embedding, and handcrafted features). Then, the token embedding and word2vector embedding are fed to a Siamese network channel. In the other channel of the model, peptide sequences and their secondary structure sequences are fed into ProtT5-XL-UniRef50 model to generate the embedding features, while handcrafted encoding techniques are used to extract the physicochemical information. Then the two kinds of features are fused and fed into a bidirectional gated recurrent unit (Bi-GRU) network for further processing. Ultimately, the outputs of the two channels are integrated into a fully connected layer, thereby facilitating the generation of the final prediction. The results on the independent test set indicate that NeuroPpred-MSN exhibits superior predictive performance, with an area under the receiver operating characteristic curve (AUROC) of 98.3%, exceeding the performance of other state-of-the-art predictors. Specifically, compared to other optimal results, this model exhibits improvements of 1.52% in accuracy (ACC), 1.52% in F1 score (F1), 3.2% in Matthews correlation coefficient (MCC), and 1.55% in AUROC. The model was further evaluated on imbalanced datasets, where it achieved the highest values in AUROC, ACC, MCC, sensitivity (SN), and F1, further demonstrating its robustness and generalization. The model can be accessed at the following GitHub repository: https://github.com/wenjean/NeuroPpred-MSN .
期刊介绍:
Interdisciplinary Sciences--Computational Life Sciences aims to cover the most recent and outstanding developments in interdisciplinary areas of sciences, especially focusing on computational life sciences, an area that is enjoying rapid development at the forefront of scientific research and technology.
The journal publishes original papers of significant general interest covering recent research and developments. Articles will be published rapidly by taking full advantage of internet technology for online submission and peer-reviewing of manuscripts, and then by publishing OnlineFirstTM through SpringerLink even before the issue is built or sent to the printer.
The editorial board consists of many leading scientists with international reputation, among others, Luc Montagnier (UNESCO, France), Dennis Salahub (University of Calgary, Canada), Weitao Yang (Duke University, USA). Prof. Dongqing Wei at the Shanghai Jiatong University is appointed as the editor-in-chief; he made important contributions in bioinformatics and computational physics and is best known for his ground-breaking works on the theory of ferroelectric liquids. With the help from a team of associate editors and the editorial board, an international journal with sound reputation shall be created.