P. Mukherjee, Y. Badr, Srushti Karvekar, Shanmugapriya Viswanathan
{"title":"冠状病毒基因组序列相似性与蛋白质序列分类","authors":"P. Mukherjee, Y. Badr, Srushti Karvekar, Shanmugapriya Viswanathan","doi":"10.33847/2686-8296.3.2_1","DOIUrl":null,"url":null,"abstract":"The world currently is going through a serious pandemic due to the coronavirus disease (COVID-19). In this study, we investigate the gene structure similarity of coronavirus genomes isolated from COVID-19 patients, Severe Acute Respiratory Syndrome (SARS) patients and bats genes. We also explore the extent of similarity between their genome structures to find if the new coronavirus is similar to either of the other genome structures. Our experimental results show that there is 82.42% similarity between the CoV-2 genome structure and the bat genome structure. Moreover, we have used a bidirectional Gated Recurrent Unit (GRU) model as the deep learning technique and an improved variant of Recurrent Neural networks (i.e., Bidirectional Long Short Term Memory model) to classify the protein families of these genomes to isolate the prominent protein family accession. The accuracy of Gated Recurrent Unit (GRU) is 98% for labeled protein sequences against the protein families. By comparing the performance of the Gated Recurrent Unit (GRU) model with the Bidirectional Long Short Term Memory (Bi-LSTM) model results, we found that the GRU model is 1.6% more accurate than the Bi-LSTM model for our multiclass protein classification problem. Our experimental results would be further support medical research purposes in targeting the protein family similarity to better understand the coronavirus genomic structure.","PeriodicalId":235278,"journal":{"name":"Journal of Digital Science","volume":"57 4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Coronavirus Genome Sequence Similarity and Protein Sequence Classification\",\"authors\":\"P. Mukherjee, Y. Badr, Srushti Karvekar, Shanmugapriya Viswanathan\",\"doi\":\"10.33847/2686-8296.3.2_1\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The world currently is going through a serious pandemic due to the coronavirus disease (COVID-19). In this study, we investigate the gene structure similarity of coronavirus genomes isolated from COVID-19 patients, Severe Acute Respiratory Syndrome (SARS) patients and bats genes. We also explore the extent of similarity between their genome structures to find if the new coronavirus is similar to either of the other genome structures. Our experimental results show that there is 82.42% similarity between the CoV-2 genome structure and the bat genome structure. Moreover, we have used a bidirectional Gated Recurrent Unit (GRU) model as the deep learning technique and an improved variant of Recurrent Neural networks (i.e., Bidirectional Long Short Term Memory model) to classify the protein families of these genomes to isolate the prominent protein family accession. The accuracy of Gated Recurrent Unit (GRU) is 98% for labeled protein sequences against the protein families. By comparing the performance of the Gated Recurrent Unit (GRU) model with the Bidirectional Long Short Term Memory (Bi-LSTM) model results, we found that the GRU model is 1.6% more accurate than the Bi-LSTM model for our multiclass protein classification problem. Our experimental results would be further support medical research purposes in targeting the protein family similarity to better understand the coronavirus genomic structure.\",\"PeriodicalId\":235278,\"journal\":{\"name\":\"Journal of Digital Science\",\"volume\":\"57 4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Digital Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.33847/2686-8296.3.2_1\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Digital Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.33847/2686-8296.3.2_1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Coronavirus Genome Sequence Similarity and Protein Sequence Classification
The world currently is going through a serious pandemic due to the coronavirus disease (COVID-19). In this study, we investigate the gene structure similarity of coronavirus genomes isolated from COVID-19 patients, Severe Acute Respiratory Syndrome (SARS) patients and bats genes. We also explore the extent of similarity between their genome structures to find if the new coronavirus is similar to either of the other genome structures. Our experimental results show that there is 82.42% similarity between the CoV-2 genome structure and the bat genome structure. Moreover, we have used a bidirectional Gated Recurrent Unit (GRU) model as the deep learning technique and an improved variant of Recurrent Neural networks (i.e., Bidirectional Long Short Term Memory model) to classify the protein families of these genomes to isolate the prominent protein family accession. The accuracy of Gated Recurrent Unit (GRU) is 98% for labeled protein sequences against the protein families. By comparing the performance of the Gated Recurrent Unit (GRU) model with the Bidirectional Long Short Term Memory (Bi-LSTM) model results, we found that the GRU model is 1.6% more accurate than the Bi-LSTM model for our multiclass protein classification problem. Our experimental results would be further support medical research purposes in targeting the protein family similarity to better understand the coronavirus genomic structure.