EC number prediction of protein sequences based on combination of hierarchical and global features.

Q3 Medicine
Fan Yang, Qiao-Ling Han, Wen-di Zhao, Yue Zhao
{"title":"EC number prediction of protein sequences based on combination of hierarchical and global features.","authors":"Fan Yang, Qiao-Ling Han, Wen-di Zhao, Yue Zhao","doi":"10.16288/j.yczz.24-102","DOIUrl":null,"url":null,"abstract":"<p><p>The identification of enzyme functions plays a crucial role in understanding the mechanisms of biological activities and advancing the development of life sciences. However, existing enzyme EC number prediction methods did not fully utilize protein sequence information and still had shortcomings in identification accuracy. To address this issue, we proposed an EC number prediction network using hierarchical features and global features (ECPN-HFGF). This method first utilized residual networks to extract generic features from protein sequences, and then employed hierarchical feature extraction modules and global feature extraction modules to further extract hierarchical and global features of protein sequences. Subsequently, the prediction results of both feature types were combined, and a multitask learning framework was utilized to achieve accurate prediction of enzyme EC numbers. Experimental results indicated that the ECPN-HFGF method performed best in the task of predicting EC numbers for protein sequences, achieving macro F1 and micro F1 scores of 95.5% and 99.0%, respectively. The ECPN-HFGF method effectively combined hierarchical and global features of protein sequences, allowing for rapid and accurate EC number prediction. Compared to current commonly used methods, this method offers significantly higher prediction accuracy, providing an efficient approach for the advancement of enzymology research and enzyme engineering applications.</p>","PeriodicalId":35536,"journal":{"name":"Yi chuan = Hereditas / Zhongguo yi chuan xue hui bian ji","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Yi chuan = Hereditas / Zhongguo yi chuan xue hui bian ji","FirstCategoryId":"1091","ListUrlMain":"https://doi.org/10.16288/j.yczz.24-102","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 0

Abstract

The identification of enzyme functions plays a crucial role in understanding the mechanisms of biological activities and advancing the development of life sciences. However, existing enzyme EC number prediction methods did not fully utilize protein sequence information and still had shortcomings in identification accuracy. To address this issue, we proposed an EC number prediction network using hierarchical features and global features (ECPN-HFGF). This method first utilized residual networks to extract generic features from protein sequences, and then employed hierarchical feature extraction modules and global feature extraction modules to further extract hierarchical and global features of protein sequences. Subsequently, the prediction results of both feature types were combined, and a multitask learning framework was utilized to achieve accurate prediction of enzyme EC numbers. Experimental results indicated that the ECPN-HFGF method performed best in the task of predicting EC numbers for protein sequences, achieving macro F1 and micro F1 scores of 95.5% and 99.0%, respectively. The ECPN-HFGF method effectively combined hierarchical and global features of protein sequences, allowing for rapid and accurate EC number prediction. Compared to current commonly used methods, this method offers significantly higher prediction accuracy, providing an efficient approach for the advancement of enzymology research and enzyme engineering applications.

基于层次和全局特征组合的蛋白质序列 EC 编号预测。
酶功能的鉴定对于理解生物活动机制和推动生命科学的发展起着至关重要的作用。然而,现有的酶EC编号预测方法没有充分利用蛋白质序列信息,在识别准确性方面仍存在不足。针对这一问题,我们提出了一种利用层次特征和全局特征的酶EC编号预测网络(ECPN-HFGF)。该方法首先利用残差网络从蛋白质序列中提取通用特征,然后利用层次特征提取模块和全局特征提取模块进一步提取蛋白质序列的层次特征和全局特征。随后,结合两种特征类型的预测结果,利用多任务学习框架实现对酶 EC 编号的准确预测。实验结果表明,ECPN-HFGF 方法在预测蛋白质序列 EC 编号的任务中表现最佳,宏观 F1 和微观 F1 分数分别达到 95.5% 和 99.0%。ECPN-HFGF方法有效地结合了蛋白质序列的层次特征和全局特征,可以快速准确地预测EC号码。与目前常用的方法相比,该方法的预测准确率显著提高,为推动酶学研究和酶工程应用提供了一种有效的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
2.50
自引率
0.00%
发文量
6699
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信