增强蛋白质-蛋白质相互作用预测中蛋白质序列描述子的特征表示。

IF 3.9 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY
Hoai-Nhan Tran, Nguyen-Phuc-Xuan Quynh, Haochen Zhao, Jianxin Wang
{"title":"增强蛋白质-蛋白质相互作用预测中蛋白质序列描述子的特征表示。","authors":"Hoai-Nhan Tran, Nguyen-Phuc-Xuan Quynh, Haochen Zhao, Jianxin Wang","doi":"10.1007/s12539-025-00723-5","DOIUrl":null,"url":null,"abstract":"<p><p>In recent years, computational methods such as machine learning and deep learning have been increasingly used to solve various bioinformatics problems related to protein sequence data, such as predicting protein interaction, protein function, subcellular location, and so on. The first crucial step in applying these methods is how to represent a protein sequence as an input feature vector, as the feature vector quality significantly impacts the performance of those methods. A range of protein sequence descriptors has been proposed to enhance the quality of protein sequence representation. Existing descriptors extract information that can be obtained from sequences, such as composition, distribution, spatial correlation between amino acids, and so on. However, improvements can still be made in spatial correlation to capture better sequence similarity, which is valuable for Protein-Protein Interaction (PPI) prediction tasks. In this study, our aim is to develop new descriptors based on six well-known sequence descriptors to improve the ability to represent protein sequences. We evaluate the performance of the new descriptors on various PPI datasets. The results demonstrate that the proposed descriptors outperform their original versions in terms of PPI prediction performance. This work also introduces ProtSeqDesc (protein sequence descriptors), a flexible Python package that includes 51 types of feature vectors, covering all proposed descriptors. The software package is aimed at meeting the demand for the application of computational methods in bioinformatics.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9000,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Enhancing the Feature Representation of Protein Sequence Descriptors in Protein-Protein Interaction Prediction.\",\"authors\":\"Hoai-Nhan Tran, Nguyen-Phuc-Xuan Quynh, Haochen Zhao, Jianxin Wang\",\"doi\":\"10.1007/s12539-025-00723-5\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>In recent years, computational methods such as machine learning and deep learning have been increasingly used to solve various bioinformatics problems related to protein sequence data, such as predicting protein interaction, protein function, subcellular location, and so on. The first crucial step in applying these methods is how to represent a protein sequence as an input feature vector, as the feature vector quality significantly impacts the performance of those methods. A range of protein sequence descriptors has been proposed to enhance the quality of protein sequence representation. Existing descriptors extract information that can be obtained from sequences, such as composition, distribution, spatial correlation between amino acids, and so on. However, improvements can still be made in spatial correlation to capture better sequence similarity, which is valuable for Protein-Protein Interaction (PPI) prediction tasks. In this study, our aim is to develop new descriptors based on six well-known sequence descriptors to improve the ability to represent protein sequences. We evaluate the performance of the new descriptors on various PPI datasets. The results demonstrate that the proposed descriptors outperform their original versions in terms of PPI prediction performance. This work also introduces ProtSeqDesc (protein sequence descriptors), a flexible Python package that includes 51 types of feature vectors, covering all proposed descriptors. The software package is aimed at meeting the demand for the application of computational methods in bioinformatics.</p>\",\"PeriodicalId\":13670,\"journal\":{\"name\":\"Interdisciplinary Sciences: Computational Life Sciences\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2025-06-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Interdisciplinary Sciences: Computational Life Sciences\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1007/s12539-025-00723-5\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Interdisciplinary Sciences: Computational Life Sciences","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1007/s12539-025-00723-5","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

近年来,机器学习和深度学习等计算方法被越来越多地用于解决与蛋白质序列数据相关的各种生物信息学问题,如预测蛋白质相互作用、蛋白质功能、亚细胞定位等。应用这些方法的第一个关键步骤是如何将蛋白质序列表示为输入特征向量,因为特征向量的质量显著影响这些方法的性能。为了提高蛋白质序列表示的质量,已经提出了一系列的蛋白质序列描述符。现有的描述符提取的信息可以从序列中获得,如组成、分布、氨基酸之间的空间相关性等。然而,空间相关性仍然可以得到改进,以获得更好的序列相似性,这对蛋白质-蛋白质相互作用(PPI)预测任务很有价值。在这项研究中,我们的目标是在六个已知的序列描述子的基础上开发新的描述子,以提高表示蛋白质序列的能力。我们评估了新描述符在各种PPI数据集上的性能。结果表明,所提出的描述符在PPI预测性能方面优于其原始版本。这项工作还介绍了ProtSeqDesc(蛋白质序列描述符),这是一个灵活的Python包,包含51种类型的特征向量,涵盖了所有提议的描述符。该软件包旨在满足计算方法在生物信息学中的应用需求。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Enhancing the Feature Representation of Protein Sequence Descriptors in Protein-Protein Interaction Prediction.

In recent years, computational methods such as machine learning and deep learning have been increasingly used to solve various bioinformatics problems related to protein sequence data, such as predicting protein interaction, protein function, subcellular location, and so on. The first crucial step in applying these methods is how to represent a protein sequence as an input feature vector, as the feature vector quality significantly impacts the performance of those methods. A range of protein sequence descriptors has been proposed to enhance the quality of protein sequence representation. Existing descriptors extract information that can be obtained from sequences, such as composition, distribution, spatial correlation between amino acids, and so on. However, improvements can still be made in spatial correlation to capture better sequence similarity, which is valuable for Protein-Protein Interaction (PPI) prediction tasks. In this study, our aim is to develop new descriptors based on six well-known sequence descriptors to improve the ability to represent protein sequences. We evaluate the performance of the new descriptors on various PPI datasets. The results demonstrate that the proposed descriptors outperform their original versions in terms of PPI prediction performance. This work also introduces ProtSeqDesc (protein sequence descriptors), a flexible Python package that includes 51 types of feature vectors, covering all proposed descriptors. The software package is aimed at meeting the demand for the application of computational methods in bioinformatics.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Interdisciplinary Sciences: Computational Life Sciences
Interdisciplinary Sciences: Computational Life Sciences MATHEMATICAL & COMPUTATIONAL BIOLOGY-
CiteScore
8.60
自引率
4.20%
发文量
55
期刊介绍: Interdisciplinary Sciences--Computational Life Sciences aims to cover the most recent and outstanding developments in interdisciplinary areas of sciences, especially focusing on computational life sciences, an area that is enjoying rapid development at the forefront of scientific research and technology. The journal publishes original papers of significant general interest covering recent research and developments. Articles will be published rapidly by taking full advantage of internet technology for online submission and peer-reviewing of manuscripts, and then by publishing OnlineFirstTM through SpringerLink even before the issue is built or sent to the printer. The editorial board consists of many leading scientists with international reputation, among others, Luc Montagnier (UNESCO, France), Dennis Salahub (University of Calgary, Canada), Weitao Yang (Duke University, USA). Prof. Dongqing Wei at the Shanghai Jiatong University is appointed as the editor-in-chief; he made important contributions in bioinformatics and computational physics and is best known for his ground-breaking works on the theory of ferroelectric liquids. With the help from a team of associate editors and the editorial board, an international journal with sound reputation shall be created.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信