Assad Alhaboub , Natalie M. Deschenes , Xena X. Li , Victoria R. Williams , Kevin C. Katz , So Yeon Park , Patryk Aftanas , Henry Wong , Calvin Sjaarda , Kyla Tozer , Finlay Maguire , Jerome Leis , Prameet Sheth , Robert Kozak
{"title":"Some assembly required: Comparison of bioinformatic pipelines for analysis of viral metagenomic sequencing from nosocomial respiratory virus outbreaks","authors":"Assad Alhaboub , Natalie M. Deschenes , Xena X. Li , Victoria R. Williams , Kevin C. Katz , So Yeon Park , Patryk Aftanas , Henry Wong , Calvin Sjaarda , Kyla Tozer , Finlay Maguire , Jerome Leis , Prameet Sheth , Robert Kozak","doi":"10.1016/j.jcv.2025.105877","DOIUrl":null,"url":null,"abstract":"<div><h3>Introduction</h3><div>Metagenomic sequencing (mGS) is a useful tool for identifying pathogens in patient samples. During nosocomial outbreaks of respiratory viruses, mGS allows for the identification of viral strains and provides insight into their genetic relatedness. Multiple bioinformatics analysis assembler are available for processing data, but a comprehensive comparison of their performance in for respiratory virus outbreaks has not been conducted.</div></div><div><h3>Methods</h3><div>This study sequenced samples from five separate nosocomial outbreaks of RNA respiratory viruses. RNA was extracted from the samples, and cDNA was synthesized using random hexamers, and then sequenced on an Illumina Miniseq following Nextera DNA Flex library preparation. The data from each outbreak were analyzed using four different assemblers: MEGAHIT, rnaSPAdes, rnaviralSPAdes, and coronaSPAdes, to evaluate their analytical performance.</div></div><div><h3>Results</h3><div>The mGS confirmed the viral identification and provided accurate strain identification for both coronavirus and parainfluenza virus samples. However, differences were observed between the assemblers in terms of the largest contigs produced and the proportion of the viral genome aligned with reference genomes. Notably, coronaSpades outperformed the other pipelines for analyzing seasonal coronaviruses, generating more complete data and covering a higher percentage of the viral genome.</div></div><div><h3>Conclusion</h3><div>Achieving a higher percentage of the viral genome sequence is crucial for a more detailed characterization, which is especially valuable for outbreak analysis where viral strains may only differ by a few genetic changes. Comparison of assemblers will allow for clinical laboratories to determine the bioinformatic pipeline that is optimal for helping clinicians better manage outbreaks.</div></div>","PeriodicalId":15517,"journal":{"name":"Journal of Clinical Virology","volume":"181 ","pages":"Article 105877"},"PeriodicalIF":3.4000,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Clinical Virology","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1386653225001192","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"VIROLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Introduction
Metagenomic sequencing (mGS) is a useful tool for identifying pathogens in patient samples. During nosocomial outbreaks of respiratory viruses, mGS allows for the identification of viral strains and provides insight into their genetic relatedness. Multiple bioinformatics analysis assembler are available for processing data, but a comprehensive comparison of their performance in for respiratory virus outbreaks has not been conducted.
Methods
This study sequenced samples from five separate nosocomial outbreaks of RNA respiratory viruses. RNA was extracted from the samples, and cDNA was synthesized using random hexamers, and then sequenced on an Illumina Miniseq following Nextera DNA Flex library preparation. The data from each outbreak were analyzed using four different assemblers: MEGAHIT, rnaSPAdes, rnaviralSPAdes, and coronaSPAdes, to evaluate their analytical performance.
Results
The mGS confirmed the viral identification and provided accurate strain identification for both coronavirus and parainfluenza virus samples. However, differences were observed between the assemblers in terms of the largest contigs produced and the proportion of the viral genome aligned with reference genomes. Notably, coronaSpades outperformed the other pipelines for analyzing seasonal coronaviruses, generating more complete data and covering a higher percentage of the viral genome.
Conclusion
Achieving a higher percentage of the viral genome sequence is crucial for a more detailed characterization, which is especially valuable for outbreak analysis where viral strains may only differ by a few genetic changes. Comparison of assemblers will allow for clinical laboratories to determine the bioinformatic pipeline that is optimal for helping clinicians better manage outbreaks.
期刊介绍:
The Journal of Clinical Virology, an esteemed international publication, serves as the official journal for both the Pan American Society for Clinical Virology and The European Society for Clinical Virology. Dedicated to advancing the understanding of human virology in clinical settings, the Journal of Clinical Virology focuses on disseminating research papers and reviews pertaining to the clinical aspects of virology. Its scope encompasses articles discussing diagnostic methodologies and virus-induced clinical conditions, with an emphasis on practicality and relevance to clinical practice.
The journal publishes on topics that include:
• new diagnostic technologies
• nucleic acid amplification and serologic testing
• targeted and metagenomic next-generation sequencing
• emerging pandemic viral threats
• respiratory viruses
• transplant viruses
• chronic viral infections
• cancer-associated viruses
• gastrointestinal viruses
• central nervous system viruses
• one health (excludes animal health)