Yizhou Peter Huang, Lauren Harmon, Eve Deering-Gardner, Xiaotu Ma, Josiah Harsh, Zhaoyu Xue, Hong Wen, Marcel Ramos, Sean Davis, Timothy J Triche
{"title":"<i>bamSliceR</i>: a Bioconductor package for rapid, cross-cohort variant and allelic bias analysis.","authors":"Yizhou Peter Huang, Lauren Harmon, Eve Deering-Gardner, Xiaotu Ma, Josiah Harsh, Zhaoyu Xue, Hong Wen, Marcel Ramos, Sean Davis, Timothy J Triche","doi":"10.1093/bioadv/vbaf098","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong>The National Cancer Institute Genomic Data Commons (GDC) provides controlled access to sequencing data from thousands of subjects, enabling large-scale study of impactful genetic alterations such as simple and complex germline and structural variants. However, efficient analysis requires significant computational resources and expertise, especially when calling variants from raw sequence reads. To solve these problems, we developed <i>bamSliceR</i>, a R/bioconductor package that builds upon the <i>GenomicDataCommons</i> package to extract aligned sequence reads from cross-GDC meta-cohorts, followed by targeted analysis of variants and effects (including transcript-aware variant annotation from transcriptome-aligned GDC RNA data).</p><p><strong>Results: </strong>Here, we demonstrate population-scale genomic and transcriptomic analyses with minimal compute burden using <i>bamSliceR</i>, identifying recurrent, clinically relevant sequence, and structural variants in the TARGET acute myeloid leukemia (AML) and BEAT-AML cohorts. We then validate results in the (non-GDC) Leucegene cohort, demonstrating how the <i>bamSliceR</i> pipeline can be seamlessly applied to replicate findings in non-GDC cohorts. These variants directly yield clinically impactful and biologically testable hypotheses for mechanistic investigation.</p><p><strong>Availability and implementation: </strong><i>bamSliceR</i> has been submitted to the Bioconductor project, where it is presently under review, and is available on GitHub at https://github.com/trichelab/bamSliceR.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf098"},"PeriodicalIF":2.4000,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12089696/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics advances","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioadv/vbaf098","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Motivation: The National Cancer Institute Genomic Data Commons (GDC) provides controlled access to sequencing data from thousands of subjects, enabling large-scale study of impactful genetic alterations such as simple and complex germline and structural variants. However, efficient analysis requires significant computational resources and expertise, especially when calling variants from raw sequence reads. To solve these problems, we developed bamSliceR, a R/bioconductor package that builds upon the GenomicDataCommons package to extract aligned sequence reads from cross-GDC meta-cohorts, followed by targeted analysis of variants and effects (including transcript-aware variant annotation from transcriptome-aligned GDC RNA data).
Results: Here, we demonstrate population-scale genomic and transcriptomic analyses with minimal compute burden using bamSliceR, identifying recurrent, clinically relevant sequence, and structural variants in the TARGET acute myeloid leukemia (AML) and BEAT-AML cohorts. We then validate results in the (non-GDC) Leucegene cohort, demonstrating how the bamSliceR pipeline can be seamlessly applied to replicate findings in non-GDC cohorts. These variants directly yield clinically impactful and biologically testable hypotheses for mechanistic investigation.
Availability and implementation: bamSliceR has been submitted to the Bioconductor project, where it is presently under review, and is available on GitHub at https://github.com/trichelab/bamSliceR.