Jonathan Karin, Reshef Mintz, Barak Raveh, Mor Nitzan
{"title":"Interpreting single-cell and spatial omics data using deep neural network training dynamics","authors":"Jonathan Karin, Reshef Mintz, Barak Raveh, Mor Nitzan","doi":"10.1038/s43588-024-00721-5","DOIUrl":null,"url":null,"abstract":"Single-cell and spatial omics datasets can be organized and interpreted by annotating single cells to distinct types, states, locations or phenotypes. However, cell annotations are inherently ambiguous, as discrete labels with subjective interpretations are assigned to heterogeneous cell populations on the basis of noisy, sparse and high-dimensional data. Here we developed Annotatability, a framework for identifying annotation mismatches and characterizing biological data structure by monitoring the dynamics and difficulty of training a deep neural network over such annotated data. Following this, we developed a signal-aware graph embedding method that enables downstream analysis of biological signals. This embedding captures cellular communities associated with target signals. Using Annotatability, we address key challenges in the interpretation of genomic data, demonstrated over eight single-cell RNA sequencing and spatial omics datasets, including identifying erroneous annotations and intermediate cell states, delineating developmental or disease trajectories, and capturing cellular heterogeneity. These results underscore the broad applicability of annotation-trainability analysis via Annotatability for unraveling cellular diversity and interpreting collective cell behaviors in health and disease. The Annotatability framework analyzes neural network training dynamics to interpret single-cell and spatial omics data. It identifies erroneous annotations and ambiguous cell states, infers trajectories from binary labels and enables signal-aware analysis.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":"4 12","pages":"941-954"},"PeriodicalIF":12.0000,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s43588-024-00721-5.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature computational science","FirstCategoryId":"1085","ListUrlMain":"https://www.nature.com/articles/s43588-024-00721-5","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
Single-cell and spatial omics datasets can be organized and interpreted by annotating single cells to distinct types, states, locations or phenotypes. However, cell annotations are inherently ambiguous, as discrete labels with subjective interpretations are assigned to heterogeneous cell populations on the basis of noisy, sparse and high-dimensional data. Here we developed Annotatability, a framework for identifying annotation mismatches and characterizing biological data structure by monitoring the dynamics and difficulty of training a deep neural network over such annotated data. Following this, we developed a signal-aware graph embedding method that enables downstream analysis of biological signals. This embedding captures cellular communities associated with target signals. Using Annotatability, we address key challenges in the interpretation of genomic data, demonstrated over eight single-cell RNA sequencing and spatial omics datasets, including identifying erroneous annotations and intermediate cell states, delineating developmental or disease trajectories, and capturing cellular heterogeneity. These results underscore the broad applicability of annotation-trainability analysis via Annotatability for unraveling cellular diversity and interpreting collective cell behaviors in health and disease. The Annotatability framework analyzes neural network training dynamics to interpret single-cell and spatial omics data. It identifies erroneous annotations and ambiguous cell states, infers trajectories from binary labels and enables signal-aware analysis.