利用多视角特征融合学习加强药物多肽序列预测

IF 2.4 3区 生物学 Q3 BIOCHEMICAL RESEARCH METHODS
Junyu Zhang, Ronglin Lu, Hongmei Zhou, Xinbo Jiang
{"title":"利用多视角特征融合学习加强药物多肽序列预测","authors":"Junyu Zhang, Ronglin Lu, Hongmei Zhou, Xinbo Jiang","doi":"10.2174/0115748936294345240510112941","DOIUrl":null,"url":null,"abstract":"Background: Currently, various types of peptides have broad implications for human health and disease. Some drug peptides play significant roles in sensory science, drug research, and cancer biology. The prediction and classification of peptide sequences are of significant importance to various industries. However, predicting peptide sequences through biological experiments is a time-consuming and expensive process. Moreover, the task of protein sequence classification and prediction faces challenges due to the high dimensionality, nonlinearity, and irregularity of protein sequence data, along with the presence of numerous unknown or unlabeled protein sequences. Therefore, an accurate and efficient method for predicting peptide classification is necessary. Methods: In our work, we used two pre-trained models to extract sequence features, TextCNN (Convolutional Neural Networks for Text Classification) and Transformer. We extracted the overall semantic information of the sequences using Transformer Encoder and extracted the local semantic information between sequences using TextCNN and concatenated them into a new feature. Finally, we used the concatenated feature for classification prediction. To validate this approach, we conducted experiments on the BP dataset, THP dataset and DPP-IV dataset and compared them with some pre-trained models. Results: Since TextCNN and Transformer Encoder extract features from different perspectives, the concatenated feature contains multi-view information, which improves the accuracy of the peptide predictor. Conclusion: Ultimately, our model demonstrated superior metrics, highlighting its efficacy in peptide sequence prediction and classification.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":"23 1","pages":""},"PeriodicalIF":2.4000,"publicationDate":"2024-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Enhancing Drug Peptide Sequence Prediction Using Multi-view Feature Fusion Learning\",\"authors\":\"Junyu Zhang, Ronglin Lu, Hongmei Zhou, Xinbo Jiang\",\"doi\":\"10.2174/0115748936294345240510112941\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Currently, various types of peptides have broad implications for human health and disease. Some drug peptides play significant roles in sensory science, drug research, and cancer biology. The prediction and classification of peptide sequences are of significant importance to various industries. However, predicting peptide sequences through biological experiments is a time-consuming and expensive process. Moreover, the task of protein sequence classification and prediction faces challenges due to the high dimensionality, nonlinearity, and irregularity of protein sequence data, along with the presence of numerous unknown or unlabeled protein sequences. Therefore, an accurate and efficient method for predicting peptide classification is necessary. Methods: In our work, we used two pre-trained models to extract sequence features, TextCNN (Convolutional Neural Networks for Text Classification) and Transformer. We extracted the overall semantic information of the sequences using Transformer Encoder and extracted the local semantic information between sequences using TextCNN and concatenated them into a new feature. Finally, we used the concatenated feature for classification prediction. To validate this approach, we conducted experiments on the BP dataset, THP dataset and DPP-IV dataset and compared them with some pre-trained models. Results: Since TextCNN and Transformer Encoder extract features from different perspectives, the concatenated feature contains multi-view information, which improves the accuracy of the peptide predictor. Conclusion: Ultimately, our model demonstrated superior metrics, highlighting its efficacy in peptide sequence prediction and classification.\",\"PeriodicalId\":10801,\"journal\":{\"name\":\"Current Bioinformatics\",\"volume\":\"23 1\",\"pages\":\"\"},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2024-05-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Current Bioinformatics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.2174/0115748936294345240510112941\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Current Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.2174/0115748936294345240510112941","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

摘要

背景:目前,各种类型的肽对人类健康和疾病有着广泛的影响。一些药物肽在感官科学、药物研究和癌症生物学中发挥着重要作用。肽序列的预测和分类对各行各业都具有重要意义。然而,通过生物实验预测肽序列是一个耗时且昂贵的过程。此外,由于蛋白质序列数据的高维性、非线性和不规则性,以及存在大量未知或未标记的蛋白质序列,蛋白质序列分类和预测任务面临着挑战。因此,需要一种准确高效的多肽分类预测方法。方法:在我们的工作中,我们使用了两种预先训练好的模型来提取序列特征,即 TextCNN(用于文本分类的卷积神经网络)和 Transformer。我们使用 Transformer 编码器提取序列的整体语义信息,使用 TextCNN 提取序列间的局部语义信息,并将它们串联成一个新特征。最后,我们使用串联特征进行分类预测。为了验证这种方法,我们在 BP 数据集、THP 数据集和 DPP-IV 数据集上进行了实验,并与一些预先训练好的模型进行了比较。实验结果由于 TextCNN 和 Transformer Encoder 从不同角度提取特征,因此合并特征包含了多视角信息,从而提高了肽预测器的准确性。结论最终,我们的模型展示了卓越的指标,突出了其在肽序列预测和分类方面的功效。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Enhancing Drug Peptide Sequence Prediction Using Multi-view Feature Fusion Learning
Background: Currently, various types of peptides have broad implications for human health and disease. Some drug peptides play significant roles in sensory science, drug research, and cancer biology. The prediction and classification of peptide sequences are of significant importance to various industries. However, predicting peptide sequences through biological experiments is a time-consuming and expensive process. Moreover, the task of protein sequence classification and prediction faces challenges due to the high dimensionality, nonlinearity, and irregularity of protein sequence data, along with the presence of numerous unknown or unlabeled protein sequences. Therefore, an accurate and efficient method for predicting peptide classification is necessary. Methods: In our work, we used two pre-trained models to extract sequence features, TextCNN (Convolutional Neural Networks for Text Classification) and Transformer. We extracted the overall semantic information of the sequences using Transformer Encoder and extracted the local semantic information between sequences using TextCNN and concatenated them into a new feature. Finally, we used the concatenated feature for classification prediction. To validate this approach, we conducted experiments on the BP dataset, THP dataset and DPP-IV dataset and compared them with some pre-trained models. Results: Since TextCNN and Transformer Encoder extract features from different perspectives, the concatenated feature contains multi-view information, which improves the accuracy of the peptide predictor. Conclusion: Ultimately, our model demonstrated superior metrics, highlighting its efficacy in peptide sequence prediction and classification.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Current Bioinformatics
Current Bioinformatics 生物-生化研究方法
CiteScore
6.60
自引率
2.50%
发文量
77
审稿时长
>12 weeks
期刊介绍: Current Bioinformatics aims to publish all the latest and outstanding developments in bioinformatics. Each issue contains a series of timely, in-depth/mini-reviews, research papers and guest edited thematic issues written by leaders in the field, covering a wide range of the integration of biology with computer and information science. The journal focuses on advances in computational molecular/structural biology, encompassing areas such as computing in biomedicine and genomics, computational proteomics and systems biology, and metabolic pathway engineering. Developments in these fields have direct implications on key issues related to health care, medicine, genetic disorders, development of agricultural products, renewable energy, environmental protection, etc.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信