{"title":"在大量最近的SARS-CoV-2基因组序列中鉴定新的错义突变","authors":"H. Cai, Kimberly K. Cai, Julang Li","doi":"10.20944/preprints202004.0482.v1","DOIUrl":null,"url":null,"abstract":"\n Background. SARS-CoV-2 infection has spread to over 200 countries since it was first reported in December of 2019. Significant country-specific variations in infection and mortality rate have been noted. Although country-specific differences in public health response have had a large impact on infection rate control, it is currently unclear as to whether evolution of the virus itself has also contributed to variations in infection and mortality rate. Previous studies on SARS-CoV-2 mutations were based on the analysis of ~ 160 SARS-CoV-2 sequences available until mid-February 2020. 2, 3, 4, 5 By mid-April, > 550 SARS-CoV-2 sequences had been deposited in GenBank, and over 8,200 in the GISAID database. Methods. We performed a sequence analysis on 474 SARS-CoV-2 genomes submitted to GenBank up to April 11, 2020 by multiple alignment using Map to a Reference Assembly and Variants/SNP identification. The results were verified on a larger scale, 8,126 hCoV-19 (SARS-CoV-2) sequences from GISAID database. Results. We identified 5 recently emerged mutations in many isolates (up to 40%). Our analysis highlights 5 frequent new mutations that have emerged since late February 2020. These mutations are: one each missense (non-synonymous) mutation in orf1ab (C1059T), orf3 (G25563T) and orf8 (C27964T), one in 5’UTR (C241T), one in a non-coding region (G29553A). The final mutation (G29553A) was found to be almost exclusive to the US isolates. The first 3 mutations are non-synonymous, leading to amino acid substitutions in the viral protein sequence. Except for C241T, all the novel mutations identified are absent in the isolates from Italy and Spain in the SARS-CoV-2 genomes deposited in GenBank and GISAID. Conclusion. The results of current study indicate that new mutations are emerging as COVID-19 pandemic are spreading to different countries and that geography specific mutants exist. The findings of current study lay the foundation for further investigation into the impact of SARS-CoV-2 mutations on disease incidence, severity, and host immune response. In addition, it may also provide insights into vaccine development and serological response detection for the virus.","PeriodicalId":15066,"journal":{"name":"Journal of Biotechnology and Biomedicine","volume":"96 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2020-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"Identification of novel missense mutations in a large number of recent SARS-CoV-2 genome sequences\",\"authors\":\"H. Cai, Kimberly K. Cai, Julang Li\",\"doi\":\"10.20944/preprints202004.0482.v1\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\n Background. SARS-CoV-2 infection has spread to over 200 countries since it was first reported in December of 2019. Significant country-specific variations in infection and mortality rate have been noted. Although country-specific differences in public health response have had a large impact on infection rate control, it is currently unclear as to whether evolution of the virus itself has also contributed to variations in infection and mortality rate. Previous studies on SARS-CoV-2 mutations were based on the analysis of ~ 160 SARS-CoV-2 sequences available until mid-February 2020. 2, 3, 4, 5 By mid-April, > 550 SARS-CoV-2 sequences had been deposited in GenBank, and over 8,200 in the GISAID database. Methods. We performed a sequence analysis on 474 SARS-CoV-2 genomes submitted to GenBank up to April 11, 2020 by multiple alignment using Map to a Reference Assembly and Variants/SNP identification. The results were verified on a larger scale, 8,126 hCoV-19 (SARS-CoV-2) sequences from GISAID database. Results. We identified 5 recently emerged mutations in many isolates (up to 40%). Our analysis highlights 5 frequent new mutations that have emerged since late February 2020. These mutations are: one each missense (non-synonymous) mutation in orf1ab (C1059T), orf3 (G25563T) and orf8 (C27964T), one in 5’UTR (C241T), one in a non-coding region (G29553A). The final mutation (G29553A) was found to be almost exclusive to the US isolates. The first 3 mutations are non-synonymous, leading to amino acid substitutions in the viral protein sequence. Except for C241T, all the novel mutations identified are absent in the isolates from Italy and Spain in the SARS-CoV-2 genomes deposited in GenBank and GISAID. Conclusion. The results of current study indicate that new mutations are emerging as COVID-19 pandemic are spreading to different countries and that geography specific mutants exist. The findings of current study lay the foundation for further investigation into the impact of SARS-CoV-2 mutations on disease incidence, severity, and host immune response. In addition, it may also provide insights into vaccine development and serological response detection for the virus.\",\"PeriodicalId\":15066,\"journal\":{\"name\":\"Journal of Biotechnology and Biomedicine\",\"volume\":\"96 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-04-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Biotechnology and Biomedicine\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.20944/preprints202004.0482.v1\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Biotechnology and Biomedicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.20944/preprints202004.0482.v1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10
摘要
背景。自2019年12月首次报道以来,SARS-CoV-2感染已蔓延到200多个国家。注意到各国在感染率和死亡率方面存在显著差异。虽然各国在公共卫生应对方面的差异对感染率控制产生了重大影响,但目前尚不清楚病毒本身的进化是否也导致了感染率和死亡率的差异。之前对SARS-CoV-2突变的研究是基于对截至2020年2月中旬的约160个SARS-CoV-2序列的分析。截至4月中旬,已有超过550个SARS-CoV-2序列存入GenBank,超过8200个序列存入GISAID数据库。方法。我们对截至2020年4月11日提交给GenBank的474个SARS-CoV-2基因组进行了序列分析,使用Map to a Reference Assembly和变体/SNP鉴定进行了多次比对。结果在GISAID数据库中的8,126个hCoV-19 (SARS-CoV-2)序列上进行了更大规模的验证。结果。我们在许多分离株中发现了5个最近出现的突变(高达40%)。我们的分析强调了自2020年2月底以来出现的5种频繁的新突变。这些突变是:orf1ab (C1059T), orf3 (G25563T)和orf8 (C27964T)各有一个错义(非同义)突变,一个在5'UTR (C241T),一个在非编码区(G29553A)。最后的突变(G29553A)被发现几乎是美国分离株所独有的。前3个突变是非同义的,导致病毒蛋白序列中的氨基酸替换。除C241T外,在GenBank和GISAID中储存的意大利和西班牙分离株SARS-CoV-2基因组中均不存在鉴定到的所有新突变。结论。目前的研究结果表明,随着COVID-19大流行在不同国家的传播,新的突变正在出现,并且存在地理特异性突变。本研究结果为进一步研究SARS-CoV-2突变对疾病发病率、严重程度和宿主免疫反应的影响奠定了基础。此外,它还可能为该病毒的疫苗开发和血清学反应检测提供见解。
Identification of novel missense mutations in a large number of recent SARS-CoV-2 genome sequences
Background. SARS-CoV-2 infection has spread to over 200 countries since it was first reported in December of 2019. Significant country-specific variations in infection and mortality rate have been noted. Although country-specific differences in public health response have had a large impact on infection rate control, it is currently unclear as to whether evolution of the virus itself has also contributed to variations in infection and mortality rate. Previous studies on SARS-CoV-2 mutations were based on the analysis of ~ 160 SARS-CoV-2 sequences available until mid-February 2020. 2, 3, 4, 5 By mid-April, > 550 SARS-CoV-2 sequences had been deposited in GenBank, and over 8,200 in the GISAID database. Methods. We performed a sequence analysis on 474 SARS-CoV-2 genomes submitted to GenBank up to April 11, 2020 by multiple alignment using Map to a Reference Assembly and Variants/SNP identification. The results were verified on a larger scale, 8,126 hCoV-19 (SARS-CoV-2) sequences from GISAID database. Results. We identified 5 recently emerged mutations in many isolates (up to 40%). Our analysis highlights 5 frequent new mutations that have emerged since late February 2020. These mutations are: one each missense (non-synonymous) mutation in orf1ab (C1059T), orf3 (G25563T) and orf8 (C27964T), one in 5’UTR (C241T), one in a non-coding region (G29553A). The final mutation (G29553A) was found to be almost exclusive to the US isolates. The first 3 mutations are non-synonymous, leading to amino acid substitutions in the viral protein sequence. Except for C241T, all the novel mutations identified are absent in the isolates from Italy and Spain in the SARS-CoV-2 genomes deposited in GenBank and GISAID. Conclusion. The results of current study indicate that new mutations are emerging as COVID-19 pandemic are spreading to different countries and that geography specific mutants exist. The findings of current study lay the foundation for further investigation into the impact of SARS-CoV-2 mutations on disease incidence, severity, and host immune response. In addition, it may also provide insights into vaccine development and serological response detection for the virus.