{"title":"Estimated size of the total genome and protein space of viruses.","authors":"Congyu Lu, Yifan Wu, Zheng Zhang, Longfei Mao, Xingyi Ge, Aiping Wu, Fengzhu Sun, Yongqiang Jiang, Yousong Peng","doi":"10.1128/msphere.00683-24","DOIUrl":null,"url":null,"abstract":"<p><p>Recent metagenomic studies have identified a vast number of viruses. However, the systematic assessment of the true genetic diversity of the whole virus community on our planet remains to be investigated. Here, we explored the genome and protein space of viruses by simulating the process of virus discovery in viral metagenomic studies. Among multiple functions, the power function was found to best fit the increasing trends of virus diversity and was, therefore, used to predict the genetic space of viruses. The estimate suggests that there are at least 8.23e+08 viral operational taxonomic units and 1.62e+09 viral protein clusters on Earth when assuming the saturation of the virus genetic space, taking into account the balance of costs and the identification of novel viruses. It is noteworthy that less than 3% of the viral genetic diversity has been uncovered thus far, emphasizing the vastness of the unexplored viral landscape. To saturate the genetic space, a total of 3.08e+08 samples would be required. Analysis of viral genetic diversity by ecosystem yielded estimates consistent with those mentioned above. Furthermore, the estimate of the virus genetic space remained robust when accounting for the redundancy of sampling, sampling time, sequencing platform, and parameters used for protein clustering. This study provides a guide for future sequencing efforts in virus discovery and contributes to a better understanding of viral diversity in nature.IMPORTANCEViruses are the most abundant and diverse biological entities on Earth. In recent years, a large number of viruses have been discovered based on sequencing technology. However, it is not clear how many kinds of viruses exist on Earth. This study estimates that there are at least 823 million types of viruses and 1.62 billion types of viral proteins. Remarkably, less than 3% of this large diversity has been uncovered to date. These findings highlight the enormous potential for discovering new viruses and reveal a significant gap in our current understanding of the viral world. This study calls for increased attention and resources to be directed toward viral discovery and metagenomics and provides a guide for future sequencing efforts, enhancing our knowledge of viral diversity in nature for ecology, biology, and public health.</p>","PeriodicalId":19052,"journal":{"name":"mSphere","volume":" ","pages":"e0068324"},"PeriodicalIF":3.7000,"publicationDate":"2025-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"mSphere","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1128/msphere.00683-24","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MICROBIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Recent metagenomic studies have identified a vast number of viruses. However, the systematic assessment of the true genetic diversity of the whole virus community on our planet remains to be investigated. Here, we explored the genome and protein space of viruses by simulating the process of virus discovery in viral metagenomic studies. Among multiple functions, the power function was found to best fit the increasing trends of virus diversity and was, therefore, used to predict the genetic space of viruses. The estimate suggests that there are at least 8.23e+08 viral operational taxonomic units and 1.62e+09 viral protein clusters on Earth when assuming the saturation of the virus genetic space, taking into account the balance of costs and the identification of novel viruses. It is noteworthy that less than 3% of the viral genetic diversity has been uncovered thus far, emphasizing the vastness of the unexplored viral landscape. To saturate the genetic space, a total of 3.08e+08 samples would be required. Analysis of viral genetic diversity by ecosystem yielded estimates consistent with those mentioned above. Furthermore, the estimate of the virus genetic space remained robust when accounting for the redundancy of sampling, sampling time, sequencing platform, and parameters used for protein clustering. This study provides a guide for future sequencing efforts in virus discovery and contributes to a better understanding of viral diversity in nature.IMPORTANCEViruses are the most abundant and diverse biological entities on Earth. In recent years, a large number of viruses have been discovered based on sequencing technology. However, it is not clear how many kinds of viruses exist on Earth. This study estimates that there are at least 823 million types of viruses and 1.62 billion types of viral proteins. Remarkably, less than 3% of this large diversity has been uncovered to date. These findings highlight the enormous potential for discovering new viruses and reveal a significant gap in our current understanding of the viral world. This study calls for increased attention and resources to be directed toward viral discovery and metagenomics and provides a guide for future sequencing efforts, enhancing our knowledge of viral diversity in nature for ecology, biology, and public health.
期刊介绍:
mSphere™ is a multi-disciplinary open-access journal that will focus on rapid publication of fundamental contributions to our understanding of microbiology. Its scope will reflect the immense range of fields within the microbial sciences, creating new opportunities for researchers to share findings that are transforming our understanding of human health and disease, ecosystems, neuroscience, agriculture, energy production, climate change, evolution, biogeochemical cycling, and food and drug production. Submissions will be encouraged of all high-quality work that makes fundamental contributions to our understanding of microbiology. mSphere™ will provide streamlined decisions, while carrying on ASM''s tradition for rigorous peer review.