蛋白质特征选择方法的比较研究

Walaa Alkady, Khaled A. ElBahnasy, Walaa K. Gad
{"title":"蛋白质特征选择方法的比较研究","authors":"Walaa Alkady, Khaled A. ElBahnasy, Walaa K. Gad","doi":"10.21608/ijicis.2022.144051.1190","DOIUrl":null,"url":null,"abstract":"Received 2022-06-11; Revised 2022-07-22; Accepted 2022-07-24 Abstract: The automated and high-throughput identification of protein function is one of the main issues in computational biology. Predicting the protein's structure is a crucial step in this procedure. In recent years, a wide range of approaches for predicting protein structure has been put forth. They can be divided into two groups: database-based and sequence-based. The first is to identify the principles behind protein structure and attempts to extract valuable characteristics from amino acid sequences. The second one uses pre-existing public annotation databases for data mining. This study emphasizes the sequence-based method and makes use of the ability of amino acid sequences to predict protein activity. The amino acid composition approach, the amino acid tuple approach, and several optimization algorithms were compared. Different protein sequence data sets were used in our experiments. Five classifiers were tested in this research. The best accuracy is 98% using across 10fold cross-validation. This represents the highest performance in the Human dataset.","PeriodicalId":244591,"journal":{"name":"International Journal of Intelligent Computing and Information Sciences","volume":"54 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparative Study on Feature Selection Methods for Protein\",\"authors\":\"Walaa Alkady, Khaled A. ElBahnasy, Walaa K. Gad\",\"doi\":\"10.21608/ijicis.2022.144051.1190\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Received 2022-06-11; Revised 2022-07-22; Accepted 2022-07-24 Abstract: The automated and high-throughput identification of protein function is one of the main issues in computational biology. Predicting the protein's structure is a crucial step in this procedure. In recent years, a wide range of approaches for predicting protein structure has been put forth. They can be divided into two groups: database-based and sequence-based. The first is to identify the principles behind protein structure and attempts to extract valuable characteristics from amino acid sequences. The second one uses pre-existing public annotation databases for data mining. This study emphasizes the sequence-based method and makes use of the ability of amino acid sequences to predict protein activity. The amino acid composition approach, the amino acid tuple approach, and several optimization algorithms were compared. Different protein sequence data sets were used in our experiments. Five classifiers were tested in this research. The best accuracy is 98% using across 10fold cross-validation. This represents the highest performance in the Human dataset.\",\"PeriodicalId\":244591,\"journal\":{\"name\":\"International Journal of Intelligent Computing and Information Sciences\",\"volume\":\"54 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Intelligent Computing and Information Sciences\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.21608/ijicis.2022.144051.1190\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Intelligent Computing and Information Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21608/ijicis.2022.144051.1190","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

收到2022-06-11;修改后的2022-07-22;摘要:蛋白质功能的自动化和高通量鉴定是计算生物学的主要问题之一。在这个过程中,预测蛋白质的结构是至关重要的一步。近年来,人们提出了多种预测蛋白质结构的方法。它们可以分为两类:基于数据库的和基于序列的。首先是确定蛋白质结构背后的原理,并试图从氨基酸序列中提取有价值的特征。第二种方法使用预先存在的公共注释数据库进行数据挖掘。本研究强调基于序列的方法,利用氨基酸序列预测蛋白质活性的能力。比较了氨基酸组成法、氨基酸元组法和几种优化算法。我们的实验使用了不同的蛋白质序列数据集。本研究对五个分类器进行了测试。使用10倍交叉验证的最佳准确率为98%。这代表了人类数据集中的最高性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Comparative Study on Feature Selection Methods for Protein
Received 2022-06-11; Revised 2022-07-22; Accepted 2022-07-24 Abstract: The automated and high-throughput identification of protein function is one of the main issues in computational biology. Predicting the protein's structure is a crucial step in this procedure. In recent years, a wide range of approaches for predicting protein structure has been put forth. They can be divided into two groups: database-based and sequence-based. The first is to identify the principles behind protein structure and attempts to extract valuable characteristics from amino acid sequences. The second one uses pre-existing public annotation databases for data mining. This study emphasizes the sequence-based method and makes use of the ability of amino acid sequences to predict protein activity. The amino acid composition approach, the amino acid tuple approach, and several optimization algorithms were compared. Different protein sequence data sets were used in our experiments. Five classifiers were tested in this research. The best accuracy is 98% using across 10fold cross-validation. This represents the highest performance in the Human dataset.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信