Assad Alhaboub , Natalie M. Deschenes , Xena X. Li , Victoria R. Williams , Kevin C. Katz , So Yeon Park , Patryk Aftanas , Henry Wong , Calvin Sjaarda , Kyla Tozer , Finlay Maguire , Jerome Leis , Prameet Sheth , Robert Kozak
{"title":"需要进行一些组装:比较用于分析医院呼吸道病毒暴发的病毒宏基因组测序的生物信息学管道","authors":"Assad Alhaboub , Natalie M. Deschenes , Xena X. Li , Victoria R. Williams , Kevin C. Katz , So Yeon Park , Patryk Aftanas , Henry Wong , Calvin Sjaarda , Kyla Tozer , Finlay Maguire , Jerome Leis , Prameet Sheth , Robert Kozak","doi":"10.1016/j.jcv.2025.105877","DOIUrl":null,"url":null,"abstract":"<div><h3>Introduction</h3><div>Metagenomic sequencing (mGS) is a useful tool for identifying pathogens in patient samples. During nosocomial outbreaks of respiratory viruses, mGS allows for the identification of viral strains and provides insight into their genetic relatedness. Multiple bioinformatics analysis assembler are available for processing data, but a comprehensive comparison of their performance in for respiratory virus outbreaks has not been conducted.</div></div><div><h3>Methods</h3><div>This study sequenced samples from five separate nosocomial outbreaks of RNA respiratory viruses. RNA was extracted from the samples, and cDNA was synthesized using random hexamers, and then sequenced on an Illumina Miniseq following Nextera DNA Flex library preparation. The data from each outbreak were analyzed using four different assemblers: MEGAHIT, rnaSPAdes, rnaviralSPAdes, and coronaSPAdes, to evaluate their analytical performance.</div></div><div><h3>Results</h3><div>The mGS confirmed the viral identification and provided accurate strain identification for both coronavirus and parainfluenza virus samples. However, differences were observed between the assemblers in terms of the largest contigs produced and the proportion of the viral genome aligned with reference genomes. Notably, coronaSpades outperformed the other pipelines for analyzing seasonal coronaviruses, generating more complete data and covering a higher percentage of the viral genome.</div></div><div><h3>Conclusion</h3><div>Achieving a higher percentage of the viral genome sequence is crucial for a more detailed characterization, which is especially valuable for outbreak analysis where viral strains may only differ by a few genetic changes. Comparison of assemblers will allow for clinical laboratories to determine the bioinformatic pipeline that is optimal for helping clinicians better manage outbreaks.</div></div>","PeriodicalId":15517,"journal":{"name":"Journal of Clinical Virology","volume":"181 ","pages":"Article 105877"},"PeriodicalIF":3.4000,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Some assembly required: Comparison of bioinformatic pipelines for analysis of viral metagenomic sequencing from nosocomial respiratory virus outbreaks\",\"authors\":\"Assad Alhaboub , Natalie M. Deschenes , Xena X. Li , Victoria R. Williams , Kevin C. Katz , So Yeon Park , Patryk Aftanas , Henry Wong , Calvin Sjaarda , Kyla Tozer , Finlay Maguire , Jerome Leis , Prameet Sheth , Robert Kozak\",\"doi\":\"10.1016/j.jcv.2025.105877\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Introduction</h3><div>Metagenomic sequencing (mGS) is a useful tool for identifying pathogens in patient samples. During nosocomial outbreaks of respiratory viruses, mGS allows for the identification of viral strains and provides insight into their genetic relatedness. Multiple bioinformatics analysis assembler are available for processing data, but a comprehensive comparison of their performance in for respiratory virus outbreaks has not been conducted.</div></div><div><h3>Methods</h3><div>This study sequenced samples from five separate nosocomial outbreaks of RNA respiratory viruses. RNA was extracted from the samples, and cDNA was synthesized using random hexamers, and then sequenced on an Illumina Miniseq following Nextera DNA Flex library preparation. The data from each outbreak were analyzed using four different assemblers: MEGAHIT, rnaSPAdes, rnaviralSPAdes, and coronaSPAdes, to evaluate their analytical performance.</div></div><div><h3>Results</h3><div>The mGS confirmed the viral identification and provided accurate strain identification for both coronavirus and parainfluenza virus samples. However, differences were observed between the assemblers in terms of the largest contigs produced and the proportion of the viral genome aligned with reference genomes. Notably, coronaSpades outperformed the other pipelines for analyzing seasonal coronaviruses, generating more complete data and covering a higher percentage of the viral genome.</div></div><div><h3>Conclusion</h3><div>Achieving a higher percentage of the viral genome sequence is crucial for a more detailed characterization, which is especially valuable for outbreak analysis where viral strains may only differ by a few genetic changes. Comparison of assemblers will allow for clinical laboratories to determine the bioinformatic pipeline that is optimal for helping clinicians better manage outbreaks.</div></div>\",\"PeriodicalId\":15517,\"journal\":{\"name\":\"Journal of Clinical Virology\",\"volume\":\"181 \",\"pages\":\"Article 105877\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2025-10-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Clinical Virology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1386653225001192\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"VIROLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Clinical Virology","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1386653225001192","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"VIROLOGY","Score":null,"Total":0}
引用次数: 0
摘要
宏基因组测序(metagenomics sequencing, mGS)是鉴定患者样本中病原体的一种有用工具。在医院内爆发呼吸道病毒时,mGS允许识别病毒株,并提供对其遗传相关性的见解。目前已有多个生物信息学分析汇编程序用于处理数据,但尚未对其在呼吸道病毒暴发中的性能进行全面比较。方法本研究对来自5个不同医院暴发的RNA呼吸道病毒样本进行测序。从样品中提取RNA,用随机六聚体合成cDNA,在Nextera DNA Flex文库制备后,在Illumina Miniseq上测序。使用四种不同的汇编程序(MEGAHIT、rnaSPAdes、rnaviralSPAdes和coronaSPAdes)分析每次爆发的数据,以评估它们的分析性能。结果mGS验证了病毒鉴定结果,对冠状病毒和副流感病毒样品进行了准确的毒株鉴定。然而,在产生的最大contigs和与参考基因组对齐的病毒基因组比例方面,组装者之间观察到差异。值得注意的是,coronaSpades在分析季节性冠状病毒方面优于其他管道,生成了更完整的数据,覆盖了更高比例的病毒基因组。获得更高百分比的病毒基因组序列对于更详细的特征描述至关重要,这对于病毒株可能只有少数遗传变化的爆发分析尤其有价值。组装体的比较将使临床实验室能够确定帮助临床医生更好地管理疫情的最佳生物信息管道。
Some assembly required: Comparison of bioinformatic pipelines for analysis of viral metagenomic sequencing from nosocomial respiratory virus outbreaks
Introduction
Metagenomic sequencing (mGS) is a useful tool for identifying pathogens in patient samples. During nosocomial outbreaks of respiratory viruses, mGS allows for the identification of viral strains and provides insight into their genetic relatedness. Multiple bioinformatics analysis assembler are available for processing data, but a comprehensive comparison of their performance in for respiratory virus outbreaks has not been conducted.
Methods
This study sequenced samples from five separate nosocomial outbreaks of RNA respiratory viruses. RNA was extracted from the samples, and cDNA was synthesized using random hexamers, and then sequenced on an Illumina Miniseq following Nextera DNA Flex library preparation. The data from each outbreak were analyzed using four different assemblers: MEGAHIT, rnaSPAdes, rnaviralSPAdes, and coronaSPAdes, to evaluate their analytical performance.
Results
The mGS confirmed the viral identification and provided accurate strain identification for both coronavirus and parainfluenza virus samples. However, differences were observed between the assemblers in terms of the largest contigs produced and the proportion of the viral genome aligned with reference genomes. Notably, coronaSpades outperformed the other pipelines for analyzing seasonal coronaviruses, generating more complete data and covering a higher percentage of the viral genome.
Conclusion
Achieving a higher percentage of the viral genome sequence is crucial for a more detailed characterization, which is especially valuable for outbreak analysis where viral strains may only differ by a few genetic changes. Comparison of assemblers will allow for clinical laboratories to determine the bioinformatic pipeline that is optimal for helping clinicians better manage outbreaks.
期刊介绍:
The Journal of Clinical Virology, an esteemed international publication, serves as the official journal for both the Pan American Society for Clinical Virology and The European Society for Clinical Virology. Dedicated to advancing the understanding of human virology in clinical settings, the Journal of Clinical Virology focuses on disseminating research papers and reviews pertaining to the clinical aspects of virology. Its scope encompasses articles discussing diagnostic methodologies and virus-induced clinical conditions, with an emphasis on practicality and relevance to clinical practice.
The journal publishes on topics that include:
• new diagnostic technologies
• nucleic acid amplification and serologic testing
• targeted and metagenomic next-generation sequencing
• emerging pandemic viral threats
• respiratory viruses
• transplant viruses
• chronic viral infections
• cancer-associated viruses
• gastrointestinal viruses
• central nervous system viruses
• one health (excludes animal health)