Addressing data management and analysis challenges in viral genomics: The Swiss HIV cohort study viral next generation sequencing database.

IF 7.7
PLOS digital health Pub Date : 2025-04-21 eCollection Date: 2025-04-01 DOI:10.1371/journal.pdig.0000825
Marius Zeeb, Paul Frischknecht, Suraj Balakrishna, Lisa Jörimann, Jasmin Tschumi, Levente Zsichla, Sandra E Chaudron, Bashkim Jaha, Kathrin Neumann, Christine Leemann, Michael Huber, Karoline Leuzinger, Huldrych F Günthard, Karin J Metzner, Roger D Kouyos
{"title":"Addressing data management and analysis challenges in viral genomics: The Swiss HIV cohort study viral next generation sequencing database.","authors":"Marius Zeeb, Paul Frischknecht, Suraj Balakrishna, Lisa Jörimann, Jasmin Tschumi, Levente Zsichla, Sandra E Chaudron, Bashkim Jaha, Kathrin Neumann, Christine Leemann, Michael Huber, Karoline Leuzinger, Huldrych F Günthard, Karin J Metzner, Roger D Kouyos","doi":"10.1371/journal.pdig.0000825","DOIUrl":null,"url":null,"abstract":"<p><p>Numerous HIV related outcomes can be determined on the viral genome, for example, resistance associated mutations, population transmission dynamics, viral heritability traits, or time since infection. Viral sequences of people with HIV (PWH) are therefore essential for therapeutic and research purposes. While in the first three decades of the HIV pandemic viral genomes were mainly sequenced using Sanger sequencing, the last decade has seen a shift towards next-generation sequencing (NGS) as the preferred method. NGS can achieve near full length genome sequence coverage and simultaneously, it accurately encapsulates the within-host diversity by characterizing HIV subpopulations. NGS opens new avenues for HIV research, but it also presents challenges concerning data management and analysis. We therefore set up the Swiss HIV Cohort Study Viral NGS Database (SHCND) to address key issues in the handling of NGS data including high loads of raw- and processed NGS data, data storage solutions, downstream application of sophisticated bioinformatic tools, high-performance computing resources, and reproducibility. The database is nested within the Swiss HIV Cohort Study (SHCS) and the Zurich Primary HIV Infection Cohort Study (ZPHI), which together enrolled 21,876 PWH since 1988 and include a biobank dating back to the early nineties. Since its initiation in 2018, the SHCND accumulated NGS sequences (plasma and proviral origin) of 5,178 unique PWH. We here describe the design, set-up, and use of this NGS database. Overall, the SHCND has contributed to several research projects on HIV pathogenesis, treatment, drug resistance, and molecular epidemiology, and has thereby become a central part of HIV-genomics research in Switzerland.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"4 4","pages":"e0000825"},"PeriodicalIF":7.7000,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12011223/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLOS digital health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1371/journal.pdig.0000825","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/4/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Numerous HIV related outcomes can be determined on the viral genome, for example, resistance associated mutations, population transmission dynamics, viral heritability traits, or time since infection. Viral sequences of people with HIV (PWH) are therefore essential for therapeutic and research purposes. While in the first three decades of the HIV pandemic viral genomes were mainly sequenced using Sanger sequencing, the last decade has seen a shift towards next-generation sequencing (NGS) as the preferred method. NGS can achieve near full length genome sequence coverage and simultaneously, it accurately encapsulates the within-host diversity by characterizing HIV subpopulations. NGS opens new avenues for HIV research, but it also presents challenges concerning data management and analysis. We therefore set up the Swiss HIV Cohort Study Viral NGS Database (SHCND) to address key issues in the handling of NGS data including high loads of raw- and processed NGS data, data storage solutions, downstream application of sophisticated bioinformatic tools, high-performance computing resources, and reproducibility. The database is nested within the Swiss HIV Cohort Study (SHCS) and the Zurich Primary HIV Infection Cohort Study (ZPHI), which together enrolled 21,876 PWH since 1988 and include a biobank dating back to the early nineties. Since its initiation in 2018, the SHCND accumulated NGS sequences (plasma and proviral origin) of 5,178 unique PWH. We here describe the design, set-up, and use of this NGS database. Overall, the SHCND has contributed to several research projects on HIV pathogenesis, treatment, drug resistance, and molecular epidemiology, and has thereby become a central part of HIV-genomics research in Switzerland.

解决病毒基因组学中的数据管理和分析挑战:瑞士HIV队列研究病毒下一代测序数据库。
许多与HIV相关的结果可以在病毒基因组上确定,例如,抗性相关的突变、群体传播动力学、病毒遗传特性或感染后的时间。因此,艾滋病毒感染者(PWH)的病毒序列对于治疗和研究目的至关重要。虽然在艾滋病毒大流行的前三十年中,主要使用Sanger测序对病毒基因组进行测序,但在过去十年中,下一代测序(NGS)已成为首选方法。NGS可以实现接近全长的基因组序列覆盖,同时,它通过表征HIV亚群准确地概括了宿主内的多样性。NGS为艾滋病毒研究开辟了新的途径,但它也提出了数据管理和分析方面的挑战。因此,我们建立了瑞士HIV队列研究病毒NGS数据库(SHCND),以解决NGS数据处理中的关键问题,包括高负载的原始和处理过的NGS数据、数据存储解决方案、复杂生物信息学工具的下游应用、高性能计算资源和可重复性。该数据库嵌套在瑞士HIV队列研究(SHCS)和苏黎世原发性HIV感染队列研究(ZPHI)中,自1988年以来共登记了21,876名PWH,其中包括一个可追溯到90年代初的生物库。自2018年启动以来,SHCND积累了5178个独特PWH的NGS序列(血浆和原源)。我们在这里描述了这个NGS数据库的设计、设置和使用。总的来说,SHCND在HIV发病机制、治疗、耐药性和分子流行病学方面的几个研究项目中做出了贡献,因此已成为瑞士HIV基因组学研究的核心部分。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信