M. Alshehri, Manee M. Manee, Ghaida G. Alharthi, Mohanad A Ibrahim, Badr M. Al-Shomrani, Fahad H Alqahtani
{"title":"基于氨基酸频率的冠状病毒种属分类决策树","authors":"M. Alshehri, Manee M. Manee, Ghaida G. Alharthi, Mohanad A Ibrahim, Badr M. Al-Shomrani, Fahad H Alqahtani","doi":"10.1109/ICAICA52286.2021.9497957","DOIUrl":null,"url":null,"abstract":"Emerging infectious diseases have received significant global attention due to Betacoronaviruses. Researchers have used different names for the same Betacoronavirus genome or the same name for different genomes, resulting in erroneous identification. An approach for Betacoronavirus species classification is proposed, adopting amino acid bias as the feature input to a decision tree. The dataset contains sequences of the four structural proteins— spike, envelope, membrane, and nucleocapsid—of ten different species. The protein sequences are first converted to an 80-dimensional feature vector in which each element corresponds to the frequency of an amino acid. Using this input, the decision tree achieved an accuracy rate of 99%, indicating that amino acid bias is an effective attribute for the classification of Betacoronavirus species. This study finds out that we can use amino acid frequencies as features. Also, it can classify known Betacoronavirus family members and label them with common names. We also recommend that authors unify the names of these genomes to minimize ambiguity caused by alternative names.","PeriodicalId":121979,"journal":{"name":"2021 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Decision Tree for Classifying Betacoronavirus Species Using Amino Acid Frequencies\",\"authors\":\"M. Alshehri, Manee M. Manee, Ghaida G. Alharthi, Mohanad A Ibrahim, Badr M. Al-Shomrani, Fahad H Alqahtani\",\"doi\":\"10.1109/ICAICA52286.2021.9497957\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Emerging infectious diseases have received significant global attention due to Betacoronaviruses. Researchers have used different names for the same Betacoronavirus genome or the same name for different genomes, resulting in erroneous identification. An approach for Betacoronavirus species classification is proposed, adopting amino acid bias as the feature input to a decision tree. The dataset contains sequences of the four structural proteins— spike, envelope, membrane, and nucleocapsid—of ten different species. The protein sequences are first converted to an 80-dimensional feature vector in which each element corresponds to the frequency of an amino acid. Using this input, the decision tree achieved an accuracy rate of 99%, indicating that amino acid bias is an effective attribute for the classification of Betacoronavirus species. This study finds out that we can use amino acid frequencies as features. Also, it can classify known Betacoronavirus family members and label them with common names. We also recommend that authors unify the names of these genomes to minimize ambiguity caused by alternative names.\",\"PeriodicalId\":121979,\"journal\":{\"name\":\"2021 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA)\",\"volume\":\"24 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-06-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICAICA52286.2021.9497957\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAICA52286.2021.9497957","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Decision Tree for Classifying Betacoronavirus Species Using Amino Acid Frequencies
Emerging infectious diseases have received significant global attention due to Betacoronaviruses. Researchers have used different names for the same Betacoronavirus genome or the same name for different genomes, resulting in erroneous identification. An approach for Betacoronavirus species classification is proposed, adopting amino acid bias as the feature input to a decision tree. The dataset contains sequences of the four structural proteins— spike, envelope, membrane, and nucleocapsid—of ten different species. The protein sequences are first converted to an 80-dimensional feature vector in which each element corresponds to the frequency of an amino acid. Using this input, the decision tree achieved an accuracy rate of 99%, indicating that amino acid bias is an effective attribute for the classification of Betacoronavirus species. This study finds out that we can use amino acid frequencies as features. Also, it can classify known Betacoronavirus family members and label them with common names. We also recommend that authors unify the names of these genomes to minimize ambiguity caused by alternative names.