Malek Senoussi, Thierry Artières, Paul Villoutreix
{"title":"Hierarchical novel class discovery for single-cell transcriptomic profiles","authors":"Malek Senoussi, Thierry Artières, Paul Villoutreix","doi":"arxiv-2409.05937","DOIUrl":null,"url":null,"abstract":"One of the major challenges arising from single-cell transcriptomics\nexperiments is the question of how to annotate the associated single-cell\ntranscriptomic profiles. Because of the large size and the high dimensionality\nof the data, automated methods for annotation are needed. We focus here on\ndatasets obtained in the context of developmental biology, where the\ndifferentiation process leads to a hierarchical structure. We consider a\nfrequent setting where both labeled and unlabeled data are available at\ntraining time, but the sets of the labels of labeled data on one side and of\nthe unlabeled data on the other side, are disjoint. It is an instance of the\nNovel Class Discovery problem. The goal is to achieve two objectives,\nclustering the data and mapping the clusters with labels. We propose extensions\nof k-Means and GMM clustering methods for solving the problem and report\ncomparative results on artificial and experimental transcriptomic datasets. Our\napproaches take advantage of the hierarchical nature of the data.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Quantitative Methods","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.05937","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
One of the major challenges arising from single-cell transcriptomics
experiments is the question of how to annotate the associated single-cell
transcriptomic profiles. Because of the large size and the high dimensionality
of the data, automated methods for annotation are needed. We focus here on
datasets obtained in the context of developmental biology, where the
differentiation process leads to a hierarchical structure. We consider a
frequent setting where both labeled and unlabeled data are available at
training time, but the sets of the labels of labeled data on one side and of
the unlabeled data on the other side, are disjoint. It is an instance of the
Novel Class Discovery problem. The goal is to achieve two objectives,
clustering the data and mapping the clusters with labels. We propose extensions
of k-Means and GMM clustering methods for solving the problem and report
comparative results on artificial and experimental transcriptomic datasets. Our
approaches take advantage of the hierarchical nature of the data.