Enhancing Drug Peptide Sequence Prediction Using Multi-view Feature Fusion Learning

IF 2.9 3区生物学 Q3 BIOCHEMICAL RESEARCH METHODS

Current Bioinformatics Pub Date : 2024-05-27 DOI:10.2174/0115748936294345240510112941

Junyu Zhang, Ronglin Lu, Hongmei Zhou, Xinbo Jiang

{"title":"Enhancing Drug Peptide Sequence Prediction Using Multi-view Feature Fusion Learning","authors":"Junyu Zhang, Ronglin Lu, Hongmei Zhou, Xinbo Jiang","doi":"10.2174/0115748936294345240510112941","DOIUrl":null,"url":null,"abstract":"Background: Currently, various types of peptides have broad implications for human health and disease. Some drug peptides play significant roles in sensory science, drug research, and cancer biology. The prediction and classification of peptide sequences are of significant importance to various industries. However, predicting peptide sequences through biological experiments is a time-consuming and expensive process. Moreover, the task of protein sequence classification and prediction faces challenges due to the high dimensionality, nonlinearity, and irregularity of protein sequence data, along with the presence of numerous unknown or unlabeled protein sequences. Therefore, an accurate and efficient method for predicting peptide classification is necessary. Methods: In our work, we used two pre-trained models to extract sequence features, TextCNN (Convolutional Neural Networks for Text Classification) and Transformer. We extracted the overall semantic information of the sequences using Transformer Encoder and extracted the local semantic information between sequences using TextCNN and concatenated them into a new feature. Finally, we used the concatenated feature for classification prediction. To validate this approach, we conducted experiments on the BP dataset, THP dataset and DPP-IV dataset and compared them with some pre-trained models. Results: Since TextCNN and Transformer Encoder extract features from different perspectives, the concatenated feature contains multi-view information, which improves the accuracy of the peptide predictor. Conclusion: Ultimately, our model demonstrated superior metrics, highlighting its efficacy in peptide sequence prediction and classification.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":"23 1","pages":""},"PeriodicalIF":2.9000,"publicationDate":"2024-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Current Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.2174/0115748936294345240510112941","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Currently, various types of peptides have broad implications for human health and disease. Some drug peptides play significant roles in sensory science, drug research, and cancer biology. The prediction and classification of peptide sequences are of significant importance to various industries. However, predicting peptide sequences through biological experiments is a time-consuming and expensive process. Moreover, the task of protein sequence classification and prediction faces challenges due to the high dimensionality, nonlinearity, and irregularity of protein sequence data, along with the presence of numerous unknown or unlabeled protein sequences. Therefore, an accurate and efficient method for predicting peptide classification is necessary. Methods: In our work, we used two pre-trained models to extract sequence features, TextCNN (Convolutional Neural Networks for Text Classification) and Transformer. We extracted the overall semantic information of the sequences using Transformer Encoder and extracted the local semantic information between sequences using TextCNN and concatenated them into a new feature. Finally, we used the concatenated feature for classification prediction. To validate this approach, we conducted experiments on the BP dataset, THP dataset and DPP-IV dataset and compared them with some pre-trained models. Results: Since TextCNN and Transformer Encoder extract features from different perspectives, the concatenated feature contains multi-view information, which improves the accuracy of the peptide predictor. Conclusion: Ultimately, our model demonstrated superior metrics, highlighting its efficacy in peptide sequence prediction and classification.

查看原文本刊更多论文

利用多视角特征融合学习加强药物多肽序列预测

背景：目前，各种类型的肽对人类健康和疾病有着广泛的影响。一些药物肽在感官科学、药物研究和癌症生物学中发挥着重要作用。肽序列的预测和分类对各行各业都具有重要意义。然而，通过生物实验预测肽序列是一个耗时且昂贵的过程。此外，由于蛋白质序列数据的高维性、非线性和不规则性，以及存在大量未知或未标记的蛋白质序列，蛋白质序列分类和预测任务面临着挑战。因此，需要一种准确高效的多肽分类预测方法。方法：在我们的工作中，我们使用了两种预先训练好的模型来提取序列特征，即 TextCNN（用于文本分类的卷积神经网络）和 Transformer。我们使用 Transformer 编码器提取序列的整体语义信息，使用 TextCNN 提取序列间的局部语义信息，并将它们串联成一个新特征。最后，我们使用串联特征进行分类预测。为了验证这种方法，我们在 BP 数据集、THP 数据集和 DPP-IV 数据集上进行了实验，并与一些预先训练好的模型进行了比较。实验结果由于 TextCNN 和 Transformer Encoder 从不同角度提取特征，因此合并特征包含了多视角信息，从而提高了肽预测器的准确性。结论最终，我们的模型展示了卓越的指标，突出了其在肽序列预测和分类方面的功效。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Current Bioinformatics 生物-生化研究方法

CiteScore

6.60

自引率

2.50%

发文量

审稿时长

>12 weeks

期刊介绍： Current Bioinformatics aims to publish all the latest and outstanding developments in bioinformatics. Each issue contains a series of timely, in-depth/mini-reviews, research papers and guest edited thematic issues written by leaders in the field, covering a wide range of the integration of biology with computer and information science. The journal focuses on advances in computational molecular/structural biology, encompassing areas such as computing in biomedicine and genomics, computational proteomics and systems biology, and metabolic pathway engineering. Developments in these fields have direct implications on key issues related to health care, medicine, genetic disorders, development of agricultural products, renewable energy, environmental protection, etc.