{"title":"用随机矩阵理论方法分析单细胞 RNA-seq 工作流程。","authors":"Sivan Leviyang","doi":"10.1007/s11538-024-01376-z","DOIUrl":null,"url":null,"abstract":"<p><p>Single cell RNA-seq (scRNAseq) workflows typically start with a count matrix and end with the clustering of sampled cells. While a range of methods have been developed to cluster scRNAseq datasets, no theoretical tools exist to explain why a particular cluster exists or why a hypothesized cluster is missing. Recently, several authors have shown that eigenvalues of scRNAseq count matrices can be approximated using random matrix models. In this work, we extend these previous works to the study of a scRNAseq workflow. We model scaled count matrices using random matrices with normally distributed entries. Using these random matrix models, we quantify the differential expression of a cluster and develop predictions for the workflow, and in particular clustering, as a function of the differential expression. We also use results from random matrix theory (RMT) to develop predictive formulas for portions of the scRNAseq workflow. Using simulated and real datasets, we show that our predictions are accurate if certain conditions hold on differential expression, with our RMT based predictions requiring particularly stringent condition. We find that real datasets violate these conditions, leading to bias in our predictions, but our predictions are better than a naive estimator and we point out future work that can improve the predictions. To our knowledge, our formulas represents the first predictive results for scRNAseq workflows.</p>","PeriodicalId":9372,"journal":{"name":"Bulletin of Mathematical Biology","volume":"87 1","pages":"4"},"PeriodicalIF":2.0000,"publicationDate":"2024-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Analysis of a Single Cell RNA-seq Workflow by Random Matrix Theory Methods.\",\"authors\":\"Sivan Leviyang\",\"doi\":\"10.1007/s11538-024-01376-z\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Single cell RNA-seq (scRNAseq) workflows typically start with a count matrix and end with the clustering of sampled cells. While a range of methods have been developed to cluster scRNAseq datasets, no theoretical tools exist to explain why a particular cluster exists or why a hypothesized cluster is missing. Recently, several authors have shown that eigenvalues of scRNAseq count matrices can be approximated using random matrix models. In this work, we extend these previous works to the study of a scRNAseq workflow. We model scaled count matrices using random matrices with normally distributed entries. Using these random matrix models, we quantify the differential expression of a cluster and develop predictions for the workflow, and in particular clustering, as a function of the differential expression. We also use results from random matrix theory (RMT) to develop predictive formulas for portions of the scRNAseq workflow. Using simulated and real datasets, we show that our predictions are accurate if certain conditions hold on differential expression, with our RMT based predictions requiring particularly stringent condition. We find that real datasets violate these conditions, leading to bias in our predictions, but our predictions are better than a naive estimator and we point out future work that can improve the predictions. To our knowledge, our formulas represents the first predictive results for scRNAseq workflows.</p>\",\"PeriodicalId\":9372,\"journal\":{\"name\":\"Bulletin of Mathematical Biology\",\"volume\":\"87 1\",\"pages\":\"4\"},\"PeriodicalIF\":2.0000,\"publicationDate\":\"2024-11-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Bulletin of Mathematical Biology\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1007/s11538-024-01376-z\",\"RegionNum\":4,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bulletin of Mathematical Biology","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1007/s11538-024-01376-z","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOLOGY","Score":null,"Total":0}
Analysis of a Single Cell RNA-seq Workflow by Random Matrix Theory Methods.
Single cell RNA-seq (scRNAseq) workflows typically start with a count matrix and end with the clustering of sampled cells. While a range of methods have been developed to cluster scRNAseq datasets, no theoretical tools exist to explain why a particular cluster exists or why a hypothesized cluster is missing. Recently, several authors have shown that eigenvalues of scRNAseq count matrices can be approximated using random matrix models. In this work, we extend these previous works to the study of a scRNAseq workflow. We model scaled count matrices using random matrices with normally distributed entries. Using these random matrix models, we quantify the differential expression of a cluster and develop predictions for the workflow, and in particular clustering, as a function of the differential expression. We also use results from random matrix theory (RMT) to develop predictive formulas for portions of the scRNAseq workflow. Using simulated and real datasets, we show that our predictions are accurate if certain conditions hold on differential expression, with our RMT based predictions requiring particularly stringent condition. We find that real datasets violate these conditions, leading to bias in our predictions, but our predictions are better than a naive estimator and we point out future work that can improve the predictions. To our knowledge, our formulas represents the first predictive results for scRNAseq workflows.
期刊介绍:
The Bulletin of Mathematical Biology, the official journal of the Society for Mathematical Biology, disseminates original research findings and other information relevant to the interface of biology and the mathematical sciences. Contributions should have relevance to both fields. In order to accommodate the broad scope of new developments, the journal accepts a variety of contributions, including:
Original research articles focused on new biological insights gained with the help of tools from the mathematical sciences or new mathematical tools and methods with demonstrated applicability to biological investigations
Research in mathematical biology education
Reviews
Commentaries
Perspectives, and contributions that discuss issues important to the profession
All contributions are peer-reviewed.