{"title":"基于层次和全局特征组合的蛋白质序列 EC 编号预测。","authors":"Fan Yang, Qiao-Ling Han, Wen-di Zhao, Yue Zhao","doi":"10.16288/j.yczz.24-102","DOIUrl":null,"url":null,"abstract":"<p><p>The identification of enzyme functions plays a crucial role in understanding the mechanisms of biological activities and advancing the development of life sciences. However, existing enzyme EC number prediction methods did not fully utilize protein sequence information and still had shortcomings in identification accuracy. To address this issue, we proposed an EC number prediction network using hierarchical features and global features (ECPN-HFGF). This method first utilized residual networks to extract generic features from protein sequences, and then employed hierarchical feature extraction modules and global feature extraction modules to further extract hierarchical and global features of protein sequences. Subsequently, the prediction results of both feature types were combined, and a multitask learning framework was utilized to achieve accurate prediction of enzyme EC numbers. Experimental results indicated that the ECPN-HFGF method performed best in the task of predicting EC numbers for protein sequences, achieving macro F1 and micro F1 scores of 95.5% and 99.0%, respectively. The ECPN-HFGF method effectively combined hierarchical and global features of protein sequences, allowing for rapid and accurate EC number prediction. Compared to current commonly used methods, this method offers significantly higher prediction accuracy, providing an efficient approach for the advancement of enzymology research and enzyme engineering applications.</p>","PeriodicalId":35536,"journal":{"name":"遗传","volume":"46 8","pages":"661-669"},"PeriodicalIF":0.0000,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"EC number prediction of protein sequences based on combination of hierarchical and global features.\",\"authors\":\"Fan Yang, Qiao-Ling Han, Wen-di Zhao, Yue Zhao\",\"doi\":\"10.16288/j.yczz.24-102\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>The identification of enzyme functions plays a crucial role in understanding the mechanisms of biological activities and advancing the development of life sciences. However, existing enzyme EC number prediction methods did not fully utilize protein sequence information and still had shortcomings in identification accuracy. To address this issue, we proposed an EC number prediction network using hierarchical features and global features (ECPN-HFGF). This method first utilized residual networks to extract generic features from protein sequences, and then employed hierarchical feature extraction modules and global feature extraction modules to further extract hierarchical and global features of protein sequences. Subsequently, the prediction results of both feature types were combined, and a multitask learning framework was utilized to achieve accurate prediction of enzyme EC numbers. Experimental results indicated that the ECPN-HFGF method performed best in the task of predicting EC numbers for protein sequences, achieving macro F1 and micro F1 scores of 95.5% and 99.0%, respectively. The ECPN-HFGF method effectively combined hierarchical and global features of protein sequences, allowing for rapid and accurate EC number prediction. Compared to current commonly used methods, this method offers significantly higher prediction accuracy, providing an efficient approach for the advancement of enzymology research and enzyme engineering applications.</p>\",\"PeriodicalId\":35536,\"journal\":{\"name\":\"遗传\",\"volume\":\"46 8\",\"pages\":\"661-669\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"遗传\",\"FirstCategoryId\":\"1091\",\"ListUrlMain\":\"https://doi.org/10.16288/j.yczz.24-102\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Medicine\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"遗传","FirstCategoryId":"1091","ListUrlMain":"https://doi.org/10.16288/j.yczz.24-102","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 0
摘要
酶功能的鉴定对于理解生物活动机制和推动生命科学的发展起着至关重要的作用。然而,现有的酶EC编号预测方法没有充分利用蛋白质序列信息,在识别准确性方面仍存在不足。针对这一问题,我们提出了一种利用层次特征和全局特征的酶EC编号预测网络(ECPN-HFGF)。该方法首先利用残差网络从蛋白质序列中提取通用特征,然后利用层次特征提取模块和全局特征提取模块进一步提取蛋白质序列的层次特征和全局特征。随后,结合两种特征类型的预测结果,利用多任务学习框架实现对酶 EC 编号的准确预测。实验结果表明,ECPN-HFGF 方法在预测蛋白质序列 EC 编号的任务中表现最佳,宏观 F1 和微观 F1 分数分别达到 95.5% 和 99.0%。ECPN-HFGF方法有效地结合了蛋白质序列的层次特征和全局特征,可以快速准确地预测EC号码。与目前常用的方法相比,该方法的预测准确率显著提高,为推动酶学研究和酶工程应用提供了一种有效的方法。
EC number prediction of protein sequences based on combination of hierarchical and global features.
The identification of enzyme functions plays a crucial role in understanding the mechanisms of biological activities and advancing the development of life sciences. However, existing enzyme EC number prediction methods did not fully utilize protein sequence information and still had shortcomings in identification accuracy. To address this issue, we proposed an EC number prediction network using hierarchical features and global features (ECPN-HFGF). This method first utilized residual networks to extract generic features from protein sequences, and then employed hierarchical feature extraction modules and global feature extraction modules to further extract hierarchical and global features of protein sequences. Subsequently, the prediction results of both feature types were combined, and a multitask learning framework was utilized to achieve accurate prediction of enzyme EC numbers. Experimental results indicated that the ECPN-HFGF method performed best in the task of predicting EC numbers for protein sequences, achieving macro F1 and micro F1 scores of 95.5% and 99.0%, respectively. The ECPN-HFGF method effectively combined hierarchical and global features of protein sequences, allowing for rapid and accurate EC number prediction. Compared to current commonly used methods, this method offers significantly higher prediction accuracy, providing an efficient approach for the advancement of enzymology research and enzyme engineering applications.
期刊介绍:
Hereditas is a national academic journal sponsored by the Institute of Genetics and Developmental Biology of the Chinese Academy of Sciences and the Chinese Society of Genetics and published by Science Press. It is a Chinese core journal and a Chinese high-quality scientific journal. The journal mainly publishes innovative research papers in the fields of genetics, genomics, cell biology, developmental biology, biological evolution, genetic engineering and biotechnology; new technologies and new methods; monographs and reviews on hot issues in the discipline; academic debates and discussions; experience in genetics teaching; introductions to famous geneticists at home and abroad; genetic counseling; information on academic conferences at home and abroad, etc. Main columns: review, frontier focus, research report, technology and method, resources and platform, experimental operation guide, genetic resources, genetics teaching, scientific news, etc.