Identification of Geographic Specific SARS-Cov-2 Mutations by Random Forest Classification and Variable Selection Methods.

IF 0.3 Q4 STATISTICS & PROBABILITY
Statistics and Applications Pub Date : 2020-07-01 Epub Date: 2020-06-30
Manoj Kandpal, Ramana V Davuluri
{"title":"Identification of Geographic Specific SARS-Cov-2 Mutations by Random Forest Classification and Variable Selection Methods.","authors":"Manoj Kandpal,&nbsp;Ramana V Davuluri","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p>RNA viral genomes have very high mutations rates. As infection spreads in the host populations, different viral lineages emerge acquiring independent mutations that can lead to varied infection and death rates in different parts of the world. By application of Random Forest classification and feature selection methods, we developed an analysis pipeline for identification of geographic specific mutations and classification of different viral lineages, focusing on the missense-variants that alter the function of the encoded proteins. We applied the pipeline on publicly available SARS-CoV-2 datasets and demonstrated that the analysis pipeline accurately identified country or region-specific viral lineages and specific mutations that discriminate different lineages. The results presented here can help designing country-specific diagnostic strategies and prioritizing the mutations for functional interpretation and experimental validations.</p>","PeriodicalId":44466,"journal":{"name":"Statistics and Applications","volume":"18 1","pages":"253-268"},"PeriodicalIF":0.3000,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7514111/pdf/nihms-1620642.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistics and Applications","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2020/6/30 0:00:00","PubModel":"Epub","JCR":"Q4","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0

Abstract

RNA viral genomes have very high mutations rates. As infection spreads in the host populations, different viral lineages emerge acquiring independent mutations that can lead to varied infection and death rates in different parts of the world. By application of Random Forest classification and feature selection methods, we developed an analysis pipeline for identification of geographic specific mutations and classification of different viral lineages, focusing on the missense-variants that alter the function of the encoded proteins. We applied the pipeline on publicly available SARS-CoV-2 datasets and demonstrated that the analysis pipeline accurately identified country or region-specific viral lineages and specific mutations that discriminate different lineages. The results presented here can help designing country-specific diagnostic strategies and prioritizing the mutations for functional interpretation and experimental validations.

随机森林分类和变量选择方法鉴定地理特异性SARS-Cov-2突变
RNA病毒基因组具有非常高的突变率。随着感染在宿主人群中传播,不同的病毒谱系出现,获得独立的突变,这可能导致世界不同地区不同的感染率和死亡率。通过应用随机森林分类和特征选择方法,我们开发了一个分析管道,用于识别地理特异性突变和分类不同的病毒谱系,重点关注改变编码蛋白功能的错义变异。我们将该管道应用于公开可用的SARS-CoV-2数据集,并证明该分析管道准确地识别了国家或地区特定的病毒谱系和区分不同谱系的特定突变。这里提出的结果可以帮助设计特定国家的诊断策略,并优先考虑功能解释和实验验证的突变。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Statistics and Applications
Statistics and Applications STATISTICS & PROBABILITY-
自引率
25.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信