Explicit Scale Simulation for analysis of RNA-sequencing count data with ALDEx2.

IF 2.8 Q1 GENETICS & HEREDITY

NAR Genomics and Bioinformatics Pub Date : 2025-08-19 eCollection Date: 2025-09-01 DOI:10.1093/nargab/lqaf108

Gregory B Gloor, Michelle Pistner Nixon, Justin D Silverman

{"title":"Explicit Scale Simulation for analysis of RNA-sequencing count data with ALDEx2.","authors":"Gregory B Gloor, Michelle Pistner Nixon, Justin D Silverman","doi":"10.1093/nargab/lqaf108","DOIUrl":null,"url":null,"abstract":"In high-throughput sequencing (HTS) studies, sample-to-sample variation in sequencing depth is driven by technical factors, and not by variation in the scale (size) of the biological system. Typically a statistical normalization removes unwanted technical variation in the data or the parameters of the model to enable differential abundance analyses. We recently showed that all normalizations make implicit assumptions about the unmeasured system scale and that errors in these assumptions can dramatically increase false positive and false negative rates. We demonstrated that these errors can be mitigated by accounting for uncertainty using a scale model, which we integrated into the ALDEx2 R package. This article provides new insights focusing on the application to transcriptomic analysis. We provide transcriptomic case studies demonstrating how scale models, rather than traditional normalizations, can reduce false positive and false negative rates in practice while enhancing the transparency and reproducibility of analyses. These scale models replace the need for dual cutoff approaches often used to address the disconnect between practical and statistical significance. We demonstrate the utility of scale models built based on known housekeeping genes in complex metatranscriptomic datasets. Thus this work provides guidance on how to incorporate scale into transcriptomic data sets.","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"7 3","pages":"lqaf108"},"PeriodicalIF":2.8000,"publicationDate":"2025-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12362245/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"NAR Genomics and Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/nargab/lqaf108","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/9/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}

引用次数: 0

Abstract

In high-throughput sequencing (HTS) studies, sample-to-sample variation in sequencing depth is driven by technical factors, and not by variation in the scale (size) of the biological system. Typically a statistical normalization removes unwanted technical variation in the data or the parameters of the model to enable differential abundance analyses. We recently showed that all normalizations make implicit assumptions about the unmeasured system scale and that errors in these assumptions can dramatically increase false positive and false negative rates. We demonstrated that these errors can be mitigated by accounting for uncertainty using a scale model, which we integrated into the ALDEx2 R package. This article provides new insights focusing on the application to transcriptomic analysis. We provide transcriptomic case studies demonstrating how scale models, rather than traditional normalizations, can reduce false positive and false negative rates in practice while enhancing the transparency and reproducibility of analyses. These scale models replace the need for dual cutoff approaches often used to address the disconnect between practical and statistical significance. We demonstrate the utility of scale models built based on known housekeeping genes in complex metatranscriptomic datasets. Thus this work provides guidance on how to incorporate scale into transcriptomic data sets.

查看原文本刊更多论文

用ALDEx2分析rna测序计数数据的显式尺度模拟。

在高通量测序（HTS）研究中，样品间测序深度的差异是由技术因素驱动的，而不是由生物系统的规模（大小）变化驱动的。通常，统计归一化可以消除数据或模型参数中不需要的技术变化，从而实现差异丰度分析。我们最近表明，所有的归一化都对未测量的系统规模做了隐含的假设，这些假设中的错误会极大地增加假阳性和假阴性率。我们证明，这些错误可以通过使用比例模型来考虑不确定性来减轻，我们将其集成到aldex2r包中。本文就转录组学分析的应用提供了新的见解。我们提供转录组学案例研究，展示了比例模型如何在实践中减少假阳性和假阴性率，而不是传统的归一化，同时提高分析的透明度和可重复性。这些比例模型取代了通常用于解决实际意义和统计意义之间脱节的双重截止方法的需要。我们展示了在复杂的亚转录组数据集中基于已知管家基因建立的比例模型的实用性。因此，这项工作为如何将规模纳入转录组数据集提供了指导。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊