Impact of Multi-Factor Features on Protein Secondary Structure Prediction

IF 4.8 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY
Biomolecules Pub Date : 2024-09-13 DOI:10.3390/biom14091155
Benzhi Dong, Zheng Liu, Dali Xu, Chang Hou, Na Niu, Guohua Wang
{"title":"Impact of Multi-Factor Features on Protein Secondary Structure Prediction","authors":"Benzhi Dong, Zheng Liu, Dali Xu, Chang Hou, Na Niu, Guohua Wang","doi":"10.3390/biom14091155","DOIUrl":null,"url":null,"abstract":"Protein secondary structure prediction (PSSP) plays a crucial role in resolving protein functions and properties. Significant progress has been made in this field in recent years, and the use of a variety of protein-related features, including amino acid sequences, position-specific score matrices (PSSM), amino acid properties, and secondary structure trend factors, to improve prediction accuracy is an important technical route for it. However, a comprehensive evaluation of the impact of these factor features in secondary structure prediction is lacking in the current work. This study quantitatively analyzes the impact of several major factors on secondary structure prediction models using a more explanatory four-class machine learning approach. The applicability of each factor in the different types of methods, the extent to which the different methods work on each factor, and the evaluation of the effect of multi-factor combinations are explored in detail. Through experiments and analyses, it was found that PSSM performs best in methods with strong high-dimensional features and complex feature extraction capabilities, while amino acid sequences, although performing poorly overall, perform relatively well in methods with strong linear processing capabilities. Also, the combination of amino acid properties and trend factors significantly improved the prediction performance. This study provides empirical evidence for future researchers to optimize multi-factor feature combinations and apply them to protein secondary structure prediction models, which is beneficial in further optimizing the use of these factors to enhance the performance of protein secondary structure prediction models.","PeriodicalId":8943,"journal":{"name":"Biomolecules","volume":null,"pages":null},"PeriodicalIF":4.8000,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biomolecules","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.3390/biom14091155","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Protein secondary structure prediction (PSSP) plays a crucial role in resolving protein functions and properties. Significant progress has been made in this field in recent years, and the use of a variety of protein-related features, including amino acid sequences, position-specific score matrices (PSSM), amino acid properties, and secondary structure trend factors, to improve prediction accuracy is an important technical route for it. However, a comprehensive evaluation of the impact of these factor features in secondary structure prediction is lacking in the current work. This study quantitatively analyzes the impact of several major factors on secondary structure prediction models using a more explanatory four-class machine learning approach. The applicability of each factor in the different types of methods, the extent to which the different methods work on each factor, and the evaluation of the effect of multi-factor combinations are explored in detail. Through experiments and analyses, it was found that PSSM performs best in methods with strong high-dimensional features and complex feature extraction capabilities, while amino acid sequences, although performing poorly overall, perform relatively well in methods with strong linear processing capabilities. Also, the combination of amino acid properties and trend factors significantly improved the prediction performance. This study provides empirical evidence for future researchers to optimize multi-factor feature combinations and apply them to protein secondary structure prediction models, which is beneficial in further optimizing the use of these factors to enhance the performance of protein secondary structure prediction models.
多因素特征对蛋白质二级结构预测的影响
蛋白质二级结构预测(PSSP)在解析蛋白质功能和性质方面发挥着至关重要的作用。近年来该领域取得了重大进展,利用氨基酸序列、位置特异性评分矩阵(PSSM)、氨基酸性质和二级结构趋势因子等多种蛋白质相关特征提高预测精度是其重要的技术路线。然而,目前的工作还缺乏对这些因子特征在二级结构预测中的影响的全面评估。本研究采用解释性更强的四类机器学习方法,定量分析了几个主要因素对二级结构预测模型的影响。详细探讨了各因素在不同类型方法中的适用性、不同方法对各因素的作用程度以及多因素组合的效果评估。通过实验和分析发现,PSSM 在具有较强的高维特征和复杂特征提取能力的方法中表现最佳,而氨基酸序列虽然整体表现不佳,但在具有较强线性处理能力的方法中表现相对较好。此外,氨基酸特性和趋势因子的结合也显著提高了预测性能。本研究为今后研究人员优化多因素特征组合并将其应用于蛋白质二级结构预测模型提供了实证依据,有利于进一步优化这些因素的使用,提高蛋白质二级结构预测模型的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Biomolecules
Biomolecules Biochemistry, Genetics and Molecular Biology-Molecular Biology
CiteScore
9.40
自引率
3.60%
发文量
1640
审稿时长
18.28 days
期刊介绍: Biomolecules (ISSN 2218-273X) is an international, peer-reviewed open access journal focusing on biogenic substances and their biological functions, structures, interactions with other molecules, and their microenvironment as well as biological systems. Biomolecules publishes reviews, regular research papers and short communications.  Our aim is to encourage scientists to publish their experimental and theoretical results in as much detail as possible. There is no restriction on the length of the papers. The full experimental details must be provided so that the results can be reproduced.
文献相关原料
公司名称 产品信息 采购帮参考价格
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信