Distinct characteristics of correlation analysis at the single-cell and the population level

IF 0.9 4区数学 Q3 Mathematics

Statistical Applications in Genetics and Molecular Biology Pub Date : 2020-08-19 DOI:10.21203/rs.3.rs-42825/v1

Guoyu Wu, Yuchao Li

{"title":"Distinct characteristics of correlation analysis at the single-cell and the population level","authors":"Guoyu Wu, Yuchao Li","doi":"10.21203/rs.3.rs-42825/v1","DOIUrl":null,"url":null,"abstract":"Abstract Correlation analysis is widely used in biological studies to infer molecular relationships within biological networks. Recently, single-cell analysis has drawn tremendous interests, for its ability to obtain high-resolution molecular phenotypes. It turns out that there is little overlap of co-expressed genes identified in single-cell level investigations with that of population level investigations. However, the nature of the relationship of correlations between single-cell and population levels remains unclear. In this manuscript, we aimed to unveil the origin of the differences between the correlation coefficients at the single-cell level and that at the population level, and bridge the gap between them. Through developing formulations to link correlations at the single-cell and the population level, we illustrated that aggregated correlations could be stronger, weaker or equal to the corresponding individual correlations, depending on the variations and the correlations within the population. When the correlation within the population is weaker than the individual correlation, the aggregated correlation is stronger than the corresponding individual correlation. Besides, our data indicated that aggregated correlation is more likely to be stronger than the corresponding individual correlation, and it was rare to find gene-pairs exclusively strongly correlated at the single-cell level. Through a bottom-up approach to model interactions between molecules in a signaling cascade or a multi-regulator-controlled gene expression, we surprisingly found that the existence of interaction between two components could not be excluded simply based on their low correlation coefficients, suggesting a reconsideration of connectivity within biological networks which was derived solely from correlation analysis. We also investigated the impact of technical random measurement errors on the correlation coefficients for the single-cell level and the population level. The results indicate that the aggregated correlation is relatively robust and less affected. Because of the heterogeneity among single cells, correlation coefficients calculated based on data of the single-cell level might be different from that of the population level. Depending on the specific question we are asking, proper sampling and normalization procedure should be done before we draw any conclusions.","PeriodicalId":49477,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":"0 1","pages":""},"PeriodicalIF":0.9000,"publicationDate":"2020-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistical Applications in Genetics and Molecular Biology","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.21203/rs.3.rs-42825/v1","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Mathematics","Score":null,"Total":0}

引用次数: 0

Abstract

Abstract Correlation analysis is widely used in biological studies to infer molecular relationships within biological networks. Recently, single-cell analysis has drawn tremendous interests, for its ability to obtain high-resolution molecular phenotypes. It turns out that there is little overlap of co-expressed genes identified in single-cell level investigations with that of population level investigations. However, the nature of the relationship of correlations between single-cell and population levels remains unclear. In this manuscript, we aimed to unveil the origin of the differences between the correlation coefficients at the single-cell level and that at the population level, and bridge the gap between them. Through developing formulations to link correlations at the single-cell and the population level, we illustrated that aggregated correlations could be stronger, weaker or equal to the corresponding individual correlations, depending on the variations and the correlations within the population. When the correlation within the population is weaker than the individual correlation, the aggregated correlation is stronger than the corresponding individual correlation. Besides, our data indicated that aggregated correlation is more likely to be stronger than the corresponding individual correlation, and it was rare to find gene-pairs exclusively strongly correlated at the single-cell level. Through a bottom-up approach to model interactions between molecules in a signaling cascade or a multi-regulator-controlled gene expression, we surprisingly found that the existence of interaction between two components could not be excluded simply based on their low correlation coefficients, suggesting a reconsideration of connectivity within biological networks which was derived solely from correlation analysis. We also investigated the impact of technical random measurement errors on the correlation coefficients for the single-cell level and the population level. The results indicate that the aggregated correlation is relatively robust and less affected. Because of the heterogeneity among single cells, correlation coefficients calculated based on data of the single-cell level might be different from that of the population level. Depending on the specific question we are asking, proper sampling and normalization procedure should be done before we draw any conclusions.

查看原文本刊更多论文

单细胞水平和群体水平相关分析的显著特征

相关分析在生物学研究中被广泛用于推断生物网络中的分子关系。最近，单细胞分析已经引起了极大的兴趣，因为它能够获得高分辨率的分子表型。结果表明，在单细胞水平调查中发现的共表达基因与群体水平调查中发现的共表达基因几乎没有重叠。然而，单细胞水平和群体水平之间的相关性关系的本质仍不清楚。在这篇文章中，我们旨在揭示单细胞水平上的相关系数与群体水平上的相关系数差异的来源，并弥合它们之间的差距。通过开发将单细胞和种群水平的相关性联系起来的公式，我们说明了，根据种群内的变异和相关性，聚合相关性可能比相应的个体相关性更强、更弱或等于个体相关性。当群体内相关性弱于个体相关性时，总体相关性强于相应的个体相关性。此外，我们的数据表明，总体相关性可能比相应的个体相关性更强，并且很少发现基因对在单细胞水平上完全强相关。通过自下而上的方法来模拟信号级联分子之间的相互作用或多调节因子控制的基因表达，我们惊讶地发现，不能简单地基于它们的低相关系数来排除两个组分之间相互作用的存在，这表明重新考虑生物网络中仅由相关分析得出的连性。我们还研究了技术随机测量误差对单细胞水平和种群水平相关系数的影响。结果表明，综合相关性具有较强的鲁棒性，受影响较小。由于单细胞间的异质性，根据单细胞水平计算的相关系数可能与群体水平计算的相关系数不同。根据我们所问的具体问题，在我们得出任何结论之前，应该进行适当的抽样和归一化程序。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Statistical Applications in Genetics and Molecular Biology 生物-生化与分子生物学

CiteScore

1.20

自引率

11.10%

发文量

审稿时长

6-12 weeks

期刊介绍： Statistical Applications in Genetics and Molecular Biology seeks to publish significant research on the application of statistical ideas to problems arising from computational biology. The focus of the papers should be on the relevant statistical issues but should contain a succinct description of the relevant biological problem being considered. The range of topics is wide and will include topics such as linkage mapping, association studies, gene finding and sequence alignment, protein structure prediction, design and analysis of microarray data, molecular evolution and phylogenetic trees, DNA topology, and data base search strategies. Both original research and review articles will be warmly received.