Pei-Ju Chin, Jaysheel D Bhavsar, Trent J Bosma, Madolyn L MacDonald, Shawn W Polson, Arifa S Khan
{"title":"Refinement of the Reference Viral Database (RVDB) for improving bioinformatics analysis of virus detection by high-throughput sequencing (HTS).","authors":"Pei-Ju Chin, Jaysheel D Bhavsar, Trent J Bosma, Madolyn L MacDonald, Shawn W Polson, Arifa S Khan","doi":"10.1128/msphere.00286-25","DOIUrl":null,"url":null,"abstract":"<p><p>All biological products are required to demonstrate the absence of adventitious viruses (AVs), which may be inadvertently introduced at different steps involved in the manufacturing process. The currently recommended <i>in vitro</i> and <i>in vivo</i> virus detection assays have limitations for broad detection and are lengthy and laborious. Additionally, the use of animals is discouraged by the global 3 R's initiative for replacement, reduction, and refinement. High-throughput or next-generation sequencing (HTS/NGS) technologies can rapidly detect known and novel viruses in biological materials. There are, however, challenges for HTS detection of AVs due to differential abundance of viral sequences in public databases, which led to the creation of a non-redundant, Reference Viral Database (RVDB) containing all viral, viral-like, and viral-related sequences, with a reduced cellular sequence content. In this paper, we describe improvements in RVDB, which include the transition of RVDB production scripts from the original Python 2 to Python 3 codebase, updating the semantic pipeline to remove misannotated non-viral sequences and irrelevant viral sequences, use of taxonomy for the removal of phages, and inclusion of a quality-check step for SARS-CoV-2 genomes to exclude low-quality sequences. Additionally, RVDB website updates include search tools for exploring the database sequences and implementation of an automatic pipeline for providing annotation information to distinguish non-viral and viral sequences in the database. These updates for refining RVDB are expected to enhance HTS bioinformatics by reducing the computational time and increasing the accuracy for virus detection.IMPORTANCEHigh-throughput sequencing (HTS) has emerged as an advanced technology for demonstrating the safety of biological products. HTS can be used as an alternative adventitious virus detection method for replacing the currently recommended <i>in vivo</i> and PCR assays and supplementing or replacing the <i>in vitro</i> cell culture assays. However, HTS bioinformatics analysis for broad virus detection, including both known and novel viruses, depends on using a comprehensive and accurately annotated database. In this study, we have refined our original comprehensive Reference Virus Database (RVDB) for greater accuracy of virus detection with a reduced computational burden. Additionally, the production script for automating the generation of RVDB was updated to facilitate reliable database production and timely availability.</p>","PeriodicalId":19052,"journal":{"name":"mSphere","volume":" ","pages":"e0028625"},"PeriodicalIF":3.7000,"publicationDate":"2025-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"mSphere","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1128/msphere.00286-25","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MICROBIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
All biological products are required to demonstrate the absence of adventitious viruses (AVs), which may be inadvertently introduced at different steps involved in the manufacturing process. The currently recommended in vitro and in vivo virus detection assays have limitations for broad detection and are lengthy and laborious. Additionally, the use of animals is discouraged by the global 3 R's initiative for replacement, reduction, and refinement. High-throughput or next-generation sequencing (HTS/NGS) technologies can rapidly detect known and novel viruses in biological materials. There are, however, challenges for HTS detection of AVs due to differential abundance of viral sequences in public databases, which led to the creation of a non-redundant, Reference Viral Database (RVDB) containing all viral, viral-like, and viral-related sequences, with a reduced cellular sequence content. In this paper, we describe improvements in RVDB, which include the transition of RVDB production scripts from the original Python 2 to Python 3 codebase, updating the semantic pipeline to remove misannotated non-viral sequences and irrelevant viral sequences, use of taxonomy for the removal of phages, and inclusion of a quality-check step for SARS-CoV-2 genomes to exclude low-quality sequences. Additionally, RVDB website updates include search tools for exploring the database sequences and implementation of an automatic pipeline for providing annotation information to distinguish non-viral and viral sequences in the database. These updates for refining RVDB are expected to enhance HTS bioinformatics by reducing the computational time and increasing the accuracy for virus detection.IMPORTANCEHigh-throughput sequencing (HTS) has emerged as an advanced technology for demonstrating the safety of biological products. HTS can be used as an alternative adventitious virus detection method for replacing the currently recommended in vivo and PCR assays and supplementing or replacing the in vitro cell culture assays. However, HTS bioinformatics analysis for broad virus detection, including both known and novel viruses, depends on using a comprehensive and accurately annotated database. In this study, we have refined our original comprehensive Reference Virus Database (RVDB) for greater accuracy of virus detection with a reduced computational burden. Additionally, the production script for automating the generation of RVDB was updated to facilitate reliable database production and timely availability.
期刊介绍:
mSphere™ is a multi-disciplinary open-access journal that will focus on rapid publication of fundamental contributions to our understanding of microbiology. Its scope will reflect the immense range of fields within the microbial sciences, creating new opportunities for researchers to share findings that are transforming our understanding of human health and disease, ecosystems, neuroscience, agriculture, energy production, climate change, evolution, biogeochemical cycling, and food and drug production. Submissions will be encouraged of all high-quality work that makes fundamental contributions to our understanding of microbiology. mSphere™ will provide streamlined decisions, while carrying on ASM''s tradition for rigorous peer review.