Normalizing need not be the norm: count-based math for analyzing single-cell data.

IF 1.3 4区 生物学 Q3 BIOLOGY
Theory in Biosciences Pub Date : 2024-02-01 Epub Date: 2023-11-10 DOI:10.1007/s12064-023-00408-x
Samuel H Church, Jasmine L Mah, Günter Wagner, Casey W Dunn
{"title":"Normalizing need not be the norm: count-based math for analyzing single-cell data.","authors":"Samuel H Church, Jasmine L Mah, Günter Wagner, Casey W Dunn","doi":"10.1007/s12064-023-00408-x","DOIUrl":null,"url":null,"abstract":"<p><p>Counting transcripts of mRNA are a key method of observation in modern biology. With advances in counting transcripts in single cells (single-cell RNA sequencing or scRNA-seq), these data are routinely used to identify cells by their transcriptional profile, and to identify genes with differential cellular expression. Because the total number of transcripts counted per cell can vary for technical reasons, the first step of many commonly used scRNA-seq workflows is to normalize by sequencing depth, transforming counts into proportional abundances. The primary objective of this step is to reshape the data such that cells with similar biological proportions of transcripts end up with similar transformed measurements. But there is growing concern that normalization and other transformations result in unintended distortions that hinder both analyses and the interpretation of results. This has led to an intense focus on optimizing methods for normalization and transformation of scRNA-seq data. Here, we take an alternative approach, by avoiding normalization and transformation altogether. We abandon the use of distances to compare cells, and instead use a restricted algebra, motivated by measurement theory and abstract algebra, that preserves the count nature of the data. We demonstrate that this restricted algebra is sufficient to draw meaningful and practical comparisons of gene expression through the use of the dot product and other elementary operations. This approach sidesteps many of the problems with common transformations, and has the added benefit of being simpler and more intuitive. We implement our approach in the package countland, available in python and R.</p>","PeriodicalId":54428,"journal":{"name":"Theory in Biosciences","volume":null,"pages":null},"PeriodicalIF":1.3000,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Theory in Biosciences","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1007/s12064-023-00408-x","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/11/10 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Counting transcripts of mRNA are a key method of observation in modern biology. With advances in counting transcripts in single cells (single-cell RNA sequencing or scRNA-seq), these data are routinely used to identify cells by their transcriptional profile, and to identify genes with differential cellular expression. Because the total number of transcripts counted per cell can vary for technical reasons, the first step of many commonly used scRNA-seq workflows is to normalize by sequencing depth, transforming counts into proportional abundances. The primary objective of this step is to reshape the data such that cells with similar biological proportions of transcripts end up with similar transformed measurements. But there is growing concern that normalization and other transformations result in unintended distortions that hinder both analyses and the interpretation of results. This has led to an intense focus on optimizing methods for normalization and transformation of scRNA-seq data. Here, we take an alternative approach, by avoiding normalization and transformation altogether. We abandon the use of distances to compare cells, and instead use a restricted algebra, motivated by measurement theory and abstract algebra, that preserves the count nature of the data. We demonstrate that this restricted algebra is sufficient to draw meaningful and practical comparisons of gene expression through the use of the dot product and other elementary operations. This approach sidesteps many of the problems with common transformations, and has the added benefit of being simpler and more intuitive. We implement our approach in the package countland, available in python and R.

Abstract Image

规范化不一定是常态:用于分析单单元格数据的基于计数的数学。
计数信使核糖核酸的转录物是现代生物学中观察的一种关键方法。随着单细胞转录物计数(单细胞RNA测序或scRNA-seq)的进展,这些数据通常用于通过转录谱鉴定细胞,并鉴定具有差异细胞表达的基因。由于技术原因,每个细胞计数的转录物总数可能会有所不同,许多常用的scRNA-seq工作流程的第一步是通过测序深度进行标准化,将计数转化为比例丰度。这一步骤的主要目的是重塑数据,使具有相似转录物生物学比例的细胞最终获得相似的转化测量值。但人们越来越担心,规范化和其他转变会导致意想不到的扭曲,阻碍分析和解释结果。这导致人们高度关注scRNA-seq数据的标准化和转换的优化方法。在这里,我们采取另一种方法,完全避免规范化和转换。我们放弃了使用距离来比较单元格,而是使用受测量理论和抽象代数驱动的受限代数,以保留数据的计数性质。我们证明了这个限制代数足以通过使用点积和其他初等运算对基因表达进行有意义和实用的比较。这种方法避开了常见转换的许多问题,并具有更简单、更直观的额外好处。我们在包countland中实现了我们的方法,该包在python和R中可用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Theory in Biosciences
Theory in Biosciences 生物-生物学
CiteScore
2.70
自引率
9.10%
发文量
21
审稿时长
3 months
期刊介绍: Theory in Biosciences focuses on new concepts in theoretical biology. It also includes analytical and modelling approaches as well as philosophical and historical issues. Central topics are: Artificial Life; Bioinformatics with a focus on novel methods, phenomena, and interpretations; Bioinspired Modeling; Complexity, Robustness, and Resilience; Embodied Cognition; Evolutionary Biology; Evo-Devo; Game Theoretic Modeling; Genetics; History of Biology; Language Evolution; Mathematical Biology; Origin of Life; Philosophy of Biology; Population Biology; Systems Biology; Theoretical Ecology; Theoretical Molecular Biology; Theoretical Neuroscience & Cognition.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信