Taxanorm: a novel taxa-specific normalization approach for microbiome data

IF 2.9 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS
Ziyue Wang, Dillon Lloyd, Shanshan Zhao, Alison Motsinger-Reif
{"title":"Taxanorm: a novel taxa-specific normalization approach for microbiome data","authors":"Ziyue Wang, Dillon Lloyd, Shanshan Zhao, Alison Motsinger-Reif","doi":"10.1186/s12859-024-05918-z","DOIUrl":null,"url":null,"abstract":"In high-throughput sequencing studies, sequencing depth, which quantifies the total number of reads, varies across samples. Unequal sequencing depth can obscure true biological signals of interest and prevent direct comparisons between samples. To remove variability due to differential sequencing depth, taxa counts are usually normalized before downstream analysis. However, most existing normalization methods scale counts using size factors that are sample specific but not taxa specific, which can result in over- or under-correction for some taxa. We developed TaxaNorm, a novel normalization method based on a zero-inflated negative binomial model. This method assumes the effects of sequencing depth on mean and dispersion vary across taxa. Incorporating the zero-inflation part can better capture the nature of microbiome data. We also propose two corresponding diagnosis tests on the varying sequencing depth effect for validation. We find that TaxaNorm achieves comparable performance to existing methods in most simulation scenarios in downstream analysis and reaches a higher power for some cases. Specifically, it balances power and false discovery control well. When applying the method in a real dataset, TaxaNorm has improved performance when correcting technical bias. TaxaNorm both sample- and taxon- specific bias by introducing an appropriate regression framework in the microbiome data, which aids in data interpretation and visualization. The ‘TaxaNorm’ R package is freely available through the CRAN repository https://CRAN.R-project.org/package=TaxaNorm and the source code can be downloaded at https://github.com/wangziyue57/TaxaNorm .","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":null,"pages":null},"PeriodicalIF":2.9000,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12859-024-05918-z","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

In high-throughput sequencing studies, sequencing depth, which quantifies the total number of reads, varies across samples. Unequal sequencing depth can obscure true biological signals of interest and prevent direct comparisons between samples. To remove variability due to differential sequencing depth, taxa counts are usually normalized before downstream analysis. However, most existing normalization methods scale counts using size factors that are sample specific but not taxa specific, which can result in over- or under-correction for some taxa. We developed TaxaNorm, a novel normalization method based on a zero-inflated negative binomial model. This method assumes the effects of sequencing depth on mean and dispersion vary across taxa. Incorporating the zero-inflation part can better capture the nature of microbiome data. We also propose two corresponding diagnosis tests on the varying sequencing depth effect for validation. We find that TaxaNorm achieves comparable performance to existing methods in most simulation scenarios in downstream analysis and reaches a higher power for some cases. Specifically, it balances power and false discovery control well. When applying the method in a real dataset, TaxaNorm has improved performance when correcting technical bias. TaxaNorm both sample- and taxon- specific bias by introducing an appropriate regression framework in the microbiome data, which aids in data interpretation and visualization. The ‘TaxaNorm’ R package is freely available through the CRAN repository https://CRAN.R-project.org/package=TaxaNorm and the source code can be downloaded at https://github.com/wangziyue57/TaxaNorm .
Taxanorm:微生物组数据的新型特定分类归一化方法
在高通量测序研究中,不同样本的测序深度(量化读数总数)各不相同。不同的测序深度会掩盖真正的生物信号,无法对不同样本进行直接比较。为了消除因测序深度不同而产生的变异,通常会在下游分析前对分类群计数进行归一化处理。然而,大多数现有的归一化方法都是使用特定于样本而非特定于分类群的大小因子来对计数进行缩放,这可能会导致某些分类群的校正过度或不足。我们开发了 TaxaNorm,这是一种基于零膨胀负二叉模型的新型归一化方法。该方法假定测序深度对平均值和离散度的影响在不同类群之间各不相同。加入零膨胀部分可以更好地捕捉微生物组数据的本质。我们还针对测序深度效应的变化提出了两个相应的诊断检测来进行验证。我们发现,在下游分析的大多数模拟场景中,TaxaNorm 的性能与现有方法相当,在某些情况下还能达到更高的功率。具体来说,它很好地平衡了功率和误发现控制。将该方法应用于真实数据集时,TaxaNorm 在纠正技术偏差方面的性能有所提高。TaxaNorm 通过在微生物组数据中引入适当的回归框架,消除了样本和分类群的特定偏差,有助于数据解释和可视化。TaxaNorm "R 软件包可通过 CRAN 存储库 https://CRAN.R-project.org/package=TaxaNorm 免费获取,源代码可从 https://github.com/wangziyue57/TaxaNorm 下载。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
BMC Bioinformatics
BMC Bioinformatics 生物-生化研究方法
CiteScore
5.70
自引率
3.30%
发文量
506
审稿时长
4.3 months
期刊介绍: BMC Bioinformatics is an open access, peer-reviewed journal that considers articles on all aspects of the development, testing and novel application of computational and statistical methods for the modeling and analysis of all kinds of biological data, as well as other areas of computational biology. BMC Bioinformatics is part of the BMC series which publishes subject-specific journals focused on the needs of individual research communities across all areas of biology and medicine. We offer an efficient, fair and friendly peer review service, and are committed to publishing all sound science, provided that there is some advance in knowledge presented by the work.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信