Replacing normalizations with interval assumptions enhances differential expression and differential abundance analyses.

IF 3.3 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS
Kyle C McGovern, Justin D Silverman
{"title":"Replacing normalizations with interval assumptions enhances differential expression and differential abundance analyses.","authors":"Kyle C McGovern, Justin D Silverman","doi":"10.1186/s12859-025-06177-2","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Methods for differential expression and differential abundance analysis often rely on normalization to address sample-to-sample variation in sequencing depth. However, normalizations imply strict, unrealistic assumptions about the unmeasured scale of biological systems (e.g., microbial load or total cellular transcription). Even slight errors in these assumptions introduce bias, leading to elevated false positive and negative rates.</p><p><strong>Results: </strong>We introduce interval assumptions as a generalization of normalizations. Unlike normalizations, our interval methods allow researchers to account for potential errors in assumptions about the system scale. Interval assumptions are also customizable and allow researchers to express more biologically plausible assumptions about scale. Interval assumptions even generalize Quantitative Microbiome Profiling (QMP), allowing researchers to account for errors in flow cytometry-based measurements of total cellular concentration. We develop a novel hypothesis testing framework that allows us to integrate interval assumptions into existing tools. We develop a modified version of the popular ALDEx2 method using interval assumptions rather than normalizations. Through real and simulated data analyses, we find that interval assumptions can dramatically decrease false positive rates (i.e., from 45% to 5%) while retaining or increasing statistical power. We also study interval assumptions under misspecification and show they still improve on normalizations.</p><p><strong>Conclusions: </strong>Interval assumptions enhance the rigor and reproducibility of differential expression and differential abundance analyses. Our results add to a growing body of literature arguing that normalizations should be replaced with alternative methods that allow researchers to account for scale uncertainty. However, compared to recent alternatives like scale models and sensitivity analyses, interval assumptions are easier to use, are more robust to misspecification, and have stronger and more interpretable inferential guarantees.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"164"},"PeriodicalIF":3.3000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12218962/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12859-025-06177-2","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Methods for differential expression and differential abundance analysis often rely on normalization to address sample-to-sample variation in sequencing depth. However, normalizations imply strict, unrealistic assumptions about the unmeasured scale of biological systems (e.g., microbial load or total cellular transcription). Even slight errors in these assumptions introduce bias, leading to elevated false positive and negative rates.

Results: We introduce interval assumptions as a generalization of normalizations. Unlike normalizations, our interval methods allow researchers to account for potential errors in assumptions about the system scale. Interval assumptions are also customizable and allow researchers to express more biologically plausible assumptions about scale. Interval assumptions even generalize Quantitative Microbiome Profiling (QMP), allowing researchers to account for errors in flow cytometry-based measurements of total cellular concentration. We develop a novel hypothesis testing framework that allows us to integrate interval assumptions into existing tools. We develop a modified version of the popular ALDEx2 method using interval assumptions rather than normalizations. Through real and simulated data analyses, we find that interval assumptions can dramatically decrease false positive rates (i.e., from 45% to 5%) while retaining or increasing statistical power. We also study interval assumptions under misspecification and show they still improve on normalizations.

Conclusions: Interval assumptions enhance the rigor and reproducibility of differential expression and differential abundance analyses. Our results add to a growing body of literature arguing that normalizations should be replaced with alternative methods that allow researchers to account for scale uncertainty. However, compared to recent alternatives like scale models and sensitivity analyses, interval assumptions are easier to use, are more robust to misspecification, and have stronger and more interpretable inferential guarantees.

用区间假设代替归一化可以增强差分表达和差分丰度分析。
背景:差异表达和差异丰度分析的方法通常依赖于归一化来解决样本间测序深度的差异。然而,归一化意味着对未测量的生物系统规模(例如,微生物负荷或总细胞转录)进行严格的、不切实际的假设。在这些假设中,即使是很小的错误也会引入偏差,导致假阳性和假阴性率升高。结果:我们引入区间假设作为标准化的推广。与归一化不同,我们的区间方法允许研究人员对系统规模假设中的潜在误差进行解释。区间假设也是可定制的,并允许研究人员对规模表达更多生物学上合理的假设。间隔假设甚至推广了定量微生物组分析(QMP),使研究人员能够解释基于流式细胞术的总细胞浓度测量中的错误。我们开发了一个新的假设测试框架,使我们能够将区间假设集成到现有的工具中。我们开发了流行的ALDEx2方法的修改版本,使用区间假设而不是规范化。通过真实和模拟数据分析,我们发现区间假设可以显著降低假阳性率(即从45%到5%),同时保留或增加统计能力。我们还研究了错误规范下的区间假设,并表明它们仍然改善了归一化。结论:区间假设提高了差异表达和差异丰度分析的严谨性和可重复性。我们的研究结果增加了越来越多的文献,认为规范化应该被允许研究人员考虑尺度不确定性的替代方法所取代。然而,与最近的替代方法如比例模型和敏感性分析相比,区间假设更容易使用,对错误规范更健壮,并且具有更强和更可解释的推断保证。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
BMC Bioinformatics
BMC Bioinformatics 生物-生化研究方法
CiteScore
5.70
自引率
3.30%
发文量
506
审稿时长
4.3 months
期刊介绍: BMC Bioinformatics is an open access, peer-reviewed journal that considers articles on all aspects of the development, testing and novel application of computational and statistical methods for the modeling and analysis of all kinds of biological data, as well as other areas of computational biology. BMC Bioinformatics is part of the BMC series which publishes subject-specific journals focused on the needs of individual research communities across all areas of biology and medicine. We offer an efficient, fair and friendly peer review service, and are committed to publishing all sound science, provided that there is some advance in knowledge presented by the work.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信