Decision Tree for Classifying Betacoronavirus Species Using Amino Acid Frequencies

M. Alshehri, Manee M. Manee, Ghaida G. Alharthi, Mohanad A Ibrahim, Badr M. Al-Shomrani, Fahad H Alqahtani
{"title":"Decision Tree for Classifying Betacoronavirus Species Using Amino Acid Frequencies","authors":"M. Alshehri, Manee M. Manee, Ghaida G. Alharthi, Mohanad A Ibrahim, Badr M. Al-Shomrani, Fahad H Alqahtani","doi":"10.1109/ICAICA52286.2021.9497957","DOIUrl":null,"url":null,"abstract":"Emerging infectious diseases have received significant global attention due to Betacoronaviruses. Researchers have used different names for the same Betacoronavirus genome or the same name for different genomes, resulting in erroneous identification. An approach for Betacoronavirus species classification is proposed, adopting amino acid bias as the feature input to a decision tree. The dataset contains sequences of the four structural proteins— spike, envelope, membrane, and nucleocapsid—of ten different species. The protein sequences are first converted to an 80-dimensional feature vector in which each element corresponds to the frequency of an amino acid. Using this input, the decision tree achieved an accuracy rate of 99%, indicating that amino acid bias is an effective attribute for the classification of Betacoronavirus species. This study finds out that we can use amino acid frequencies as features. Also, it can classify known Betacoronavirus family members and label them with common names. We also recommend that authors unify the names of these genomes to minimize ambiguity caused by alternative names.","PeriodicalId":121979,"journal":{"name":"2021 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAICA52286.2021.9497957","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Emerging infectious diseases have received significant global attention due to Betacoronaviruses. Researchers have used different names for the same Betacoronavirus genome or the same name for different genomes, resulting in erroneous identification. An approach for Betacoronavirus species classification is proposed, adopting amino acid bias as the feature input to a decision tree. The dataset contains sequences of the four structural proteins— spike, envelope, membrane, and nucleocapsid—of ten different species. The protein sequences are first converted to an 80-dimensional feature vector in which each element corresponds to the frequency of an amino acid. Using this input, the decision tree achieved an accuracy rate of 99%, indicating that amino acid bias is an effective attribute for the classification of Betacoronavirus species. This study finds out that we can use amino acid frequencies as features. Also, it can classify known Betacoronavirus family members and label them with common names. We also recommend that authors unify the names of these genomes to minimize ambiguity caused by alternative names.
基于氨基酸频率的冠状病毒种属分类决策树
由于冠状病毒,新出现的传染病受到了全球的广泛关注。研究人员对相同的冠状病毒基因组使用了不同的名称,或者对不同的基因组使用了相同的名称,导致错误的识别。提出了一种采用氨基酸偏差作为决策树特征输入的冠状病毒物种分类方法。该数据集包含十种不同物种的四种结构蛋白-穗,包膜,膜和核衣壳的序列。首先将蛋白质序列转换为80维特征向量,其中每个元素对应于氨基酸的频率。使用该输入,决策树的准确率达到99%,表明氨基酸偏差是冠状病毒种分类的有效属性。这项研究发现我们可以用氨基酸频率作为特征。此外,它还可以对已知的冠状病毒家族成员进行分类,并用通用名称标记它们。我们还建议作者统一这些基因组的名称,以尽量减少由替代名称引起的歧义。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信