Explicit Scale Simulation for analysis of RNA-sequencing count data with ALDEx2.

IF 2.8 Q1 GENETICS & HEREDITY
NAR Genomics and Bioinformatics Pub Date : 2025-08-19 eCollection Date: 2025-09-01 DOI:10.1093/nargab/lqaf108
Gregory B Gloor, Michelle Pistner Nixon, Justin D Silverman
{"title":"Explicit Scale Simulation for analysis of RNA-sequencing count data with ALDEx2.","authors":"Gregory B Gloor, Michelle Pistner Nixon, Justin D Silverman","doi":"10.1093/nargab/lqaf108","DOIUrl":null,"url":null,"abstract":"<p><p>In high-throughput sequencing (HTS) studies, sample-to-sample variation in sequencing depth is driven by technical factors, and not by variation in the scale (size) of the biological system. Typically a statistical normalization removes unwanted technical variation in the data or the parameters of the model to enable differential abundance analyses. We recently showed that all normalizations make implicit assumptions about the unmeasured system scale and that errors in these assumptions can dramatically increase false positive and false negative rates. We demonstrated that these errors can be mitigated by accounting for uncertainty using a <i>scale model</i>, which we integrated into the ALDEx2 R package. This article provides new insights focusing on the application to transcriptomic analysis. We provide transcriptomic case studies demonstrating how scale models, rather than traditional normalizations, can reduce false positive and false negative rates in practice while enhancing the transparency and reproducibility of analyses. These scale models replace the need for dual cutoff approaches often used to address the disconnect between practical and statistical significance. We demonstrate the utility of scale models built based on known housekeeping genes in complex metatranscriptomic datasets. Thus this work provides guidance on how to incorporate scale into transcriptomic data sets.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"7 3","pages":"lqaf108"},"PeriodicalIF":2.8000,"publicationDate":"2025-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12362245/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"NAR Genomics and Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/nargab/lqaf108","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/9/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0

Abstract

In high-throughput sequencing (HTS) studies, sample-to-sample variation in sequencing depth is driven by technical factors, and not by variation in the scale (size) of the biological system. Typically a statistical normalization removes unwanted technical variation in the data or the parameters of the model to enable differential abundance analyses. We recently showed that all normalizations make implicit assumptions about the unmeasured system scale and that errors in these assumptions can dramatically increase false positive and false negative rates. We demonstrated that these errors can be mitigated by accounting for uncertainty using a scale model, which we integrated into the ALDEx2 R package. This article provides new insights focusing on the application to transcriptomic analysis. We provide transcriptomic case studies demonstrating how scale models, rather than traditional normalizations, can reduce false positive and false negative rates in practice while enhancing the transparency and reproducibility of analyses. These scale models replace the need for dual cutoff approaches often used to address the disconnect between practical and statistical significance. We demonstrate the utility of scale models built based on known housekeeping genes in complex metatranscriptomic datasets. Thus this work provides guidance on how to incorporate scale into transcriptomic data sets.

用ALDEx2分析rna测序计数数据的显式尺度模拟。
在高通量测序(HTS)研究中,样品间测序深度的差异是由技术因素驱动的,而不是由生物系统的规模(大小)变化驱动的。通常,统计归一化可以消除数据或模型参数中不需要的技术变化,从而实现差异丰度分析。我们最近表明,所有的归一化都对未测量的系统规模做了隐含的假设,这些假设中的错误会极大地增加假阳性和假阴性率。我们证明,这些错误可以通过使用比例模型来考虑不确定性来减轻,我们将其集成到aldex2r包中。本文就转录组学分析的应用提供了新的见解。我们提供转录组学案例研究,展示了比例模型如何在实践中减少假阳性和假阴性率,而不是传统的归一化,同时提高分析的透明度和可重复性。这些比例模型取代了通常用于解决实际意义和统计意义之间脱节的双重截止方法的需要。我们展示了在复杂的亚转录组数据集中基于已知管家基因建立的比例模型的实用性。因此,这项工作为如何将规模纳入转录组数据集提供了指导。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
8.00
自引率
2.20%
发文量
95
审稿时长
15 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信