用随机矩阵理论方法分析单细胞 RNA-seq 工作流程。

IF 2 4区数学 Q2 BIOLOGY

Bulletin of Mathematical Biology Pub Date : 2024-11-25 DOI:10.1007/s11538-024-01376-z

Sivan Leviyang

{"title":"用随机矩阵理论方法分析单细胞 RNA-seq 工作流程。","authors":"Sivan Leviyang","doi":"10.1007/s11538-024-01376-z","DOIUrl":null,"url":null,"abstract":"Single cell RNA-seq (scRNAseq) workflows typically start with a count matrix and end with the clustering of sampled cells. While a range of methods have been developed to cluster scRNAseq datasets, no theoretical tools exist to explain why a particular cluster exists or why a hypothesized cluster is missing. Recently, several authors have shown that eigenvalues of scRNAseq count matrices can be approximated using random matrix models. In this work, we extend these previous works to the study of a scRNAseq workflow. We model scaled count matrices using random matrices with normally distributed entries. Using these random matrix models, we quantify the differential expression of a cluster and develop predictions for the workflow, and in particular clustering, as a function of the differential expression. We also use results from random matrix theory (RMT) to develop predictive formulas for portions of the scRNAseq workflow. Using simulated and real datasets, we show that our predictions are accurate if certain conditions hold on differential expression, with our RMT based predictions requiring particularly stringent condition. We find that real datasets violate these conditions, leading to bias in our predictions, but our predictions are better than a naive estimator and we point out future work that can improve the predictions. To our knowledge, our formulas represents the first predictive results for scRNAseq workflows.","PeriodicalId":9372,"journal":{"name":"Bulletin of Mathematical Biology","volume":"87 1","pages":"4"},"PeriodicalIF":2.0000,"publicationDate":"2024-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Analysis of a Single Cell RNA-seq Workflow by Random Matrix Theory Methods.\",\"authors\":\"Sivan Leviyang\",\"doi\":\"10.1007/s11538-024-01376-z\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Single cell RNA-seq (scRNAseq) workflows typically start with a count matrix and end with the clustering of sampled cells. While a range of methods have been developed to cluster scRNAseq datasets, no theoretical tools exist to explain why a particular cluster exists or why a hypothesized cluster is missing. Recently, several authors have shown that eigenvalues of scRNAseq count matrices can be approximated using random matrix models. In this work, we extend these previous works to the study of a scRNAseq workflow. We model scaled count matrices using random matrices with normally distributed entries. Using these random matrix models, we quantify the differential expression of a cluster and develop predictions for the workflow, and in particular clustering, as a function of the differential expression. We also use results from random matrix theory (RMT) to develop predictive formulas for portions of the scRNAseq workflow. Using simulated and real datasets, we show that our predictions are accurate if certain conditions hold on differential expression, with our RMT based predictions requiring particularly stringent condition. We find that real datasets violate these conditions, leading to bias in our predictions, but our predictions are better than a naive estimator and we point out future work that can improve the predictions. To our knowledge, our formulas represents the first predictive results for scRNAseq workflows.\",\"PeriodicalId\":9372,\"journal\":{\"name\":\"Bulletin of Mathematical Biology\",\"volume\":\"87 1\",\"pages\":\"4\"},\"PeriodicalIF\":2.0000,\"publicationDate\":\"2024-11-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Bulletin of Mathematical Biology\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1007/s11538-024-01376-z\",\"RegionNum\":4,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bulletin of Mathematical Biology","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1007/s11538-024-01376-z","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

单细胞 RNAseq（scRNAseq）工作流程通常以计数矩阵开始，以采样细胞聚类结束。虽然已经开发出了一系列对 scRNAseq 数据集进行聚类的方法，但还没有理论工具来解释为什么存在特定的聚类或为什么缺少假设的聚类。最近，几位学者已经证明，scRNAseq 计数矩阵的特征值可以用随机矩阵模型来近似表示。在本研究中，我们将这些前人的研究成果扩展到 scRNAseq 工作流程的研究中。我们使用具有正态分布条目的随机矩阵来建立缩放计数矩阵模型。利用这些随机矩阵模型，我们量化了聚类的差异表达，并根据差异表达的函数对工作流程，特别是聚类进行了预测。我们还利用随机矩阵理论（RMT）的结果，为 scRNAseq 工作流程的某些部分制定了预测公式。我们使用模拟和真实数据集表明，如果差异表达的某些条件成立，我们的预测是准确的，而基于 RMT 的预测需要特别严格的条件。我们发现，真实数据集违反了这些条件，导致我们的预测出现偏差，但我们的预测结果优于天真的估计值，而且我们指出了未来可以改进预测的工作。据我们所知，我们的公式代表了 scRNAseq 工作流的首个预测结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Analysis of a Single Cell RNA-seq Workflow by Random Matrix Theory Methods.

Single cell RNA-seq (scRNAseq) workflows typically start with a count matrix and end with the clustering of sampled cells. While a range of methods have been developed to cluster scRNAseq datasets, no theoretical tools exist to explain why a particular cluster exists or why a hypothesized cluster is missing. Recently, several authors have shown that eigenvalues of scRNAseq count matrices can be approximated using random matrix models. In this work, we extend these previous works to the study of a scRNAseq workflow. We model scaled count matrices using random matrices with normally distributed entries. Using these random matrix models, we quantify the differential expression of a cluster and develop predictions for the workflow, and in particular clustering, as a function of the differential expression. We also use results from random matrix theory (RMT) to develop predictive formulas for portions of the scRNAseq workflow. Using simulated and real datasets, we show that our predictions are accurate if certain conditions hold on differential expression, with our RMT based predictions requiring particularly stringent condition. We find that real datasets violate these conditions, leading to bias in our predictions, but our predictions are better than a naive estimator and we point out future work that can improve the predictions. To our knowledge, our formulas represents the first predictive results for scRNAseq workflows.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Bulletin of Mathematical Biology 生物-生物学

CiteScore

3.90

自引率

8.60%

发文量

123

审稿时长

7.5 months

期刊介绍： The Bulletin of Mathematical Biology, the official journal of the Society for Mathematical Biology, disseminates original research findings and other information relevant to the interface of biology and the mathematical sciences. Contributions should have relevance to both fields. In order to accommodate the broad scope of new developments, the journal accepts a variety of contributions, including: Original research articles focused on new biological insights gained with the help of tools from the mathematical sciences or new mathematical tools and methods with demonstrated applicability to biological investigations Research in mathematical biology education Reviews Commentaries Perspectives, and contributions that discuss issues important to the profession All contributions are peer-reviewed.