Theresa Willem, Vladimir A. Shitov, Malte D. Luecken, Niki Kilbertus, Stefan Bauer, Marie Piraud, Alena Buyx, Fabian J. Theis
{"title":"Biases in machine-learning models of human single-cell data","authors":"Theresa Willem, Vladimir A. Shitov, Malte D. Luecken, Niki Kilbertus, Stefan Bauer, Marie Piraud, Alena Buyx, Fabian J. Theis","doi":"10.1038/s41556-025-01619-8","DOIUrl":null,"url":null,"abstract":"Recent machine-learning (ML)-based advances in single-cell data science have enabled the stratification of human tissue donors at single-cell resolution, promising to provide valuable diagnostic and prognostic insights. However, such insights are susceptible to biases. Here we discuss various biases that emerge along the pipeline of ML-based single-cell analysis, ranging from societal biases affecting whose samples are collected, to clinical and cohort biases that influence the generalizability of single-cell datasets, biases stemming from single-cell sequencing, ML biases specific to (weakly supervised or unsupervised) ML models trained on human single-cell samples and biases during the interpretation of results from ML models. We end by providing methods for single-cell data scientists to assess and mitigate biases, and call for efforts to address the root causes of biases. This Perspective discusses the various biases that can emerge along the pipeline of machine learning-based single-cell analysis and presents methods to train models on human single-cell data in order to assess and mitigate these biases.","PeriodicalId":18977,"journal":{"name":"Nature Cell Biology","volume":"27 3","pages":"384-392"},"PeriodicalIF":17.3000,"publicationDate":"2025-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature Cell Biology","FirstCategoryId":"99","ListUrlMain":"https://www.nature.com/articles/s41556-025-01619-8","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CELL BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Recent machine-learning (ML)-based advances in single-cell data science have enabled the stratification of human tissue donors at single-cell resolution, promising to provide valuable diagnostic and prognostic insights. However, such insights are susceptible to biases. Here we discuss various biases that emerge along the pipeline of ML-based single-cell analysis, ranging from societal biases affecting whose samples are collected, to clinical and cohort biases that influence the generalizability of single-cell datasets, biases stemming from single-cell sequencing, ML biases specific to (weakly supervised or unsupervised) ML models trained on human single-cell samples and biases during the interpretation of results from ML models. We end by providing methods for single-cell data scientists to assess and mitigate biases, and call for efforts to address the root causes of biases. This Perspective discusses the various biases that can emerge along the pipeline of machine learning-based single-cell analysis and presents methods to train models on human single-cell data in order to assess and mitigate these biases.
期刊介绍:
Nature Cell Biology, a prestigious journal, upholds a commitment to publishing papers of the highest quality across all areas of cell biology, with a particular focus on elucidating mechanisms underlying fundamental cell biological processes. The journal's broad scope encompasses various areas of interest, including but not limited to:
-Autophagy
-Cancer biology
-Cell adhesion and migration
-Cell cycle and growth
-Cell death
-Chromatin and epigenetics
-Cytoskeletal dynamics
-Developmental biology
-DNA replication and repair
-Mechanisms of human disease
-Mechanobiology
-Membrane traffic and dynamics
-Metabolism
-Nuclear organization and dynamics
-Organelle biology
-Proteolysis and quality control
-RNA biology
-Signal transduction
-Stem cell biology