{"title":"Repeated Decision Stumping Distils Simple Rules from Single-Cell Data.","authors":"Ivan A Croydon-Veleslavov, Michael P H Stumpf","doi":"10.1089/cmb.2021.0613","DOIUrl":null,"url":null,"abstract":"<p><p>Single-cell data afford unprecedented insights into molecular processes. But the complexity and size of these data sets have proved challenging and given rise to a large armory of statistical and machine learning approaches. The majority of approaches focuses on either describing features of these data, or making predictions and classifying unlabeled samples. In this study, we introduce repeated decision stumping (ReDX) as a method to distill simple models from single-cell data. We develop decision trees of depth one-hence \"stumps\"-to identify in an inductive manner, gene products involved in driving cell fate transitions, and in applications to published data we are able to discover the key players involved in these processes in an unbiased manner without prior knowledge. Our algorithm is deliberately targeting the simplest possible candidate hypotheses that can be extracted from complex high-dimensional data. There are three reasons for this: (1) the predictions become straightforwardly testable hypotheses; (2) the identified candidates form the basis for further mechanistic model development, for example, for engineering and synthetic biology interventions; and (3) this approach complements existing descriptive modeling approaches and frameworks. The approach is computationally efficient, has remarkable predictive power, including in simulation studies where the ground truth is known, and yields robust and statistically stable predictors; the same set of candidates is generated by applying the algorithm to different subsamples of experimental data.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"21-40"},"PeriodicalIF":1.4000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computational Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1089/cmb.2021.0613","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/4 0:00:00","PubModel":"Epub","JCR":"Q4","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Single-cell data afford unprecedented insights into molecular processes. But the complexity and size of these data sets have proved challenging and given rise to a large armory of statistical and machine learning approaches. The majority of approaches focuses on either describing features of these data, or making predictions and classifying unlabeled samples. In this study, we introduce repeated decision stumping (ReDX) as a method to distill simple models from single-cell data. We develop decision trees of depth one-hence "stumps"-to identify in an inductive manner, gene products involved in driving cell fate transitions, and in applications to published data we are able to discover the key players involved in these processes in an unbiased manner without prior knowledge. Our algorithm is deliberately targeting the simplest possible candidate hypotheses that can be extracted from complex high-dimensional data. There are three reasons for this: (1) the predictions become straightforwardly testable hypotheses; (2) the identified candidates form the basis for further mechanistic model development, for example, for engineering and synthetic biology interventions; and (3) this approach complements existing descriptive modeling approaches and frameworks. The approach is computationally efficient, has remarkable predictive power, including in simulation studies where the ground truth is known, and yields robust and statistically stable predictors; the same set of candidates is generated by applying the algorithm to different subsamples of experimental data.
期刊介绍:
Journal of Computational Biology is the leading peer-reviewed journal in computational biology and bioinformatics, publishing in-depth statistical, mathematical, and computational analysis of methods, as well as their practical impact. Available only online, this is an essential journal for scientists and students who want to keep abreast of developments in bioinformatics.
Journal of Computational Biology coverage includes:
-Genomics
-Mathematical modeling and simulation
-Distributed and parallel biological computing
-Designing biological databases
-Pattern matching and pattern detection
-Linking disparate databases and data
-New tools for computational biology
-Relational and object-oriented database technology for bioinformatics
-Biological expert system design and use
-Reasoning by analogy, hypothesis formation, and testing by machine
-Management of biological databases