Refinement of the Reference Viral Database (RVDB) for improving bioinformatics analysis of virus detection by high-throughput sequencing (HTS).

IF 3.7 2区 生物学 Q2 MICROBIOLOGY
mSphere Pub Date : 2025-06-23 DOI:10.1128/msphere.00286-25
Pei-Ju Chin, Jaysheel D Bhavsar, Trent J Bosma, Madolyn L MacDonald, Shawn W Polson, Arifa S Khan
{"title":"Refinement of the Reference Viral Database (RVDB) for improving bioinformatics analysis of virus detection by high-throughput sequencing (HTS).","authors":"Pei-Ju Chin, Jaysheel D Bhavsar, Trent J Bosma, Madolyn L MacDonald, Shawn W Polson, Arifa S Khan","doi":"10.1128/msphere.00286-25","DOIUrl":null,"url":null,"abstract":"<p><p>All biological products are required to demonstrate the absence of adventitious viruses (AVs), which may be inadvertently introduced at different steps involved in the manufacturing process. The currently recommended <i>in vitro</i> and <i>in vivo</i> virus detection assays have limitations for broad detection and are lengthy and laborious. Additionally, the use of animals is discouraged by the global 3 R's initiative for replacement, reduction, and refinement. High-throughput or next-generation sequencing (HTS/NGS) technologies can rapidly detect known and novel viruses in biological materials. There are, however, challenges for HTS detection of AVs due to differential abundance of viral sequences in public databases, which led to the creation of a non-redundant, Reference Viral Database (RVDB) containing all viral, viral-like, and viral-related sequences, with a reduced cellular sequence content. In this paper, we describe improvements in RVDB, which include the transition of RVDB production scripts from the original Python 2 to Python 3 codebase, updating the semantic pipeline to remove misannotated non-viral sequences and irrelevant viral sequences, use of taxonomy for the removal of phages, and inclusion of a quality-check step for SARS-CoV-2 genomes to exclude low-quality sequences. Additionally, RVDB website updates include search tools for exploring the database sequences and implementation of an automatic pipeline for providing annotation information to distinguish non-viral and viral sequences in the database. These updates for refining RVDB are expected to enhance HTS bioinformatics by reducing the computational time and increasing the accuracy for virus detection.IMPORTANCEHigh-throughput sequencing (HTS) has emerged as an advanced technology for demonstrating the safety of biological products. HTS can be used as an alternative adventitious virus detection method for replacing the currently recommended <i>in vivo</i> and PCR assays and supplementing or replacing the <i>in vitro</i> cell culture assays. However, HTS bioinformatics analysis for broad virus detection, including both known and novel viruses, depends on using a comprehensive and accurately annotated database. In this study, we have refined our original comprehensive Reference Virus Database (RVDB) for greater accuracy of virus detection with a reduced computational burden. Additionally, the production script for automating the generation of RVDB was updated to facilitate reliable database production and timely availability.</p>","PeriodicalId":19052,"journal":{"name":"mSphere","volume":" ","pages":"e0028625"},"PeriodicalIF":3.7000,"publicationDate":"2025-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"mSphere","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1128/msphere.00286-25","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MICROBIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

All biological products are required to demonstrate the absence of adventitious viruses (AVs), which may be inadvertently introduced at different steps involved in the manufacturing process. The currently recommended in vitro and in vivo virus detection assays have limitations for broad detection and are lengthy and laborious. Additionally, the use of animals is discouraged by the global 3 R's initiative for replacement, reduction, and refinement. High-throughput or next-generation sequencing (HTS/NGS) technologies can rapidly detect known and novel viruses in biological materials. There are, however, challenges for HTS detection of AVs due to differential abundance of viral sequences in public databases, which led to the creation of a non-redundant, Reference Viral Database (RVDB) containing all viral, viral-like, and viral-related sequences, with a reduced cellular sequence content. In this paper, we describe improvements in RVDB, which include the transition of RVDB production scripts from the original Python 2 to Python 3 codebase, updating the semantic pipeline to remove misannotated non-viral sequences and irrelevant viral sequences, use of taxonomy for the removal of phages, and inclusion of a quality-check step for SARS-CoV-2 genomes to exclude low-quality sequences. Additionally, RVDB website updates include search tools for exploring the database sequences and implementation of an automatic pipeline for providing annotation information to distinguish non-viral and viral sequences in the database. These updates for refining RVDB are expected to enhance HTS bioinformatics by reducing the computational time and increasing the accuracy for virus detection.IMPORTANCEHigh-throughput sequencing (HTS) has emerged as an advanced technology for demonstrating the safety of biological products. HTS can be used as an alternative adventitious virus detection method for replacing the currently recommended in vivo and PCR assays and supplementing or replacing the in vitro cell culture assays. However, HTS bioinformatics analysis for broad virus detection, including both known and novel viruses, depends on using a comprehensive and accurately annotated database. In this study, we have refined our original comprehensive Reference Virus Database (RVDB) for greater accuracy of virus detection with a reduced computational burden. Additionally, the production script for automating the generation of RVDB was updated to facilitate reliable database production and timely availability.

改进参考病毒数据库(RVDB)以提高高通量测序(HTS)检测病毒的生物信息学分析。
所有生物制品都必须证明不存在外源性病毒(av),这些病毒可能在生产过程的不同步骤中无意中引入。目前推荐的体外和体内病毒检测方法在广泛检测方面存在局限性,且耗时费力。此外,全球3r倡议的替代、减少和改进也不鼓励使用动物。高通量或下一代测序(HTS/NGS)技术可以快速检测生物材料中的已知和新型病毒。然而,由于公共数据库中病毒序列的丰度差异,HTS检测av面临挑战,这导致创建一个非冗余的参考病毒数据库(RVDB),其中包含所有病毒,病毒样和病毒相关序列,细胞序列含量减少。在本文中,我们描述了RVDB的改进,其中包括将RVDB生产脚本从原始的Python 2转换到Python 3代码库,更新语义管道以删除错误注释的非病毒序列和不相关的病毒序列,使用分类法去除噬菌体,以及包含对SARS-CoV-2基因组的质量检查步骤以排除低质量序列。此外,RVDB网站的更新还包括用于探索数据库序列的搜索工具,以及用于提供注释信息以区分数据库中的非病毒和病毒序列的自动管道的实现。这些改进RVDB的更新预计将通过减少计算时间和提高病毒检测的准确性来增强HTS生物信息学。高通量测序(HTS)已成为证明生物制品安全性的先进技术。HTS可以作为替代目前推荐的体内和PCR检测方法,补充或替代体外细胞培养检测方法的一种替代的不定病毒检测方法。然而,用于广泛的病毒检测(包括已知和新型病毒)的HTS生物信息学分析依赖于使用一个全面和准确注释的数据库。在这项研究中,我们改进了原始的综合参考病毒数据库(RVDB),以便在减少计算负担的同时提高病毒检测的准确性。此外,还更新了用于自动化生成RVDB的生产脚本,以促进可靠的数据库生产和及时可用性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
mSphere
mSphere Immunology and Microbiology-Microbiology
CiteScore
8.50
自引率
2.10%
发文量
192
审稿时长
11 weeks
期刊介绍: mSphere™ is a multi-disciplinary open-access journal that will focus on rapid publication of fundamental contributions to our understanding of microbiology. Its scope will reflect the immense range of fields within the microbial sciences, creating new opportunities for researchers to share findings that are transforming our understanding of human health and disease, ecosystems, neuroscience, agriculture, energy production, climate change, evolution, biogeochemical cycling, and food and drug production. Submissions will be encouraged of all high-quality work that makes fundamental contributions to our understanding of microbiology. mSphere™ will provide streamlined decisions, while carrying on ASM''s tradition for rigorous peer review.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信