Marcos López-De-Castro, Alberto García-Galindo, José González-Gomariz, Rubén Armañanzas
{"title":"可靠单细胞RNA-seq注释的保形推断。","authors":"Marcos López-De-Castro, Alberto García-Galindo, José González-Gomariz, Rubén Armañanzas","doi":"10.1093/bioinformatics/btaf521","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong>Despite the inherent complexity associated to automatic cell type assignments, most supervised learning models overlook rigorous uncertainty quantification on the annotations. Although some existing pipelines incorporate rejection options under predefined circumstances, they usually rely on arbitrary assumptions and do not provide statistical guarantees. In this work, we propose a methodology based on the conformal prediction framework to provide reliable single-cell annotations. Conformal prediction provides statistical guarantees on the outcome predictions without making any assumption about the underlying distribution of the data. Our methodological proposal leverages conformal inference to address two critical challenges in single-cell RNA sequencing annotations: (i) detect out-of-distribution cell types in the query data; and, (ii) perform reliable uncertainty quantification of the cell annotations through well-calibrated prediction sets.</p><p><strong>Results: </strong>We evaluated the anomaly detector and the uncertainty-aware annotator in 10 batched experiments derived from various tissues. Specifically, we studied three different annotation taxonomies (standard, classwise, and cluster) alongside three different non-conformity measures. The results showed that our anomaly detector effectively identified previously unseen cell types, producing well-calibrated prediction sets. This rigorous annotation helped maintain coverage probabilities at the expected significance level. Finally, we illustrate how the integration of conformal prediction outputs enhanced further downstream analyses.</p><p><strong>Availability and implementation: </strong>The automatic scRNA-seq annotator is available at https://github.com/digital-medicine-research-group-UNAV/conformalized_single_cell_annotator and https://doi.org/10.5281/zenodo.15870599.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4000,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12506889/pdf/","citationCount":"0","resultStr":"{\"title\":\"Conformal inference for reliable single cell RNA-seq annotation.\",\"authors\":\"Marcos López-De-Castro, Alberto García-Galindo, José González-Gomariz, Rubén Armañanzas\",\"doi\":\"10.1093/bioinformatics/btaf521\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Motivation: </strong>Despite the inherent complexity associated to automatic cell type assignments, most supervised learning models overlook rigorous uncertainty quantification on the annotations. Although some existing pipelines incorporate rejection options under predefined circumstances, they usually rely on arbitrary assumptions and do not provide statistical guarantees. In this work, we propose a methodology based on the conformal prediction framework to provide reliable single-cell annotations. Conformal prediction provides statistical guarantees on the outcome predictions without making any assumption about the underlying distribution of the data. Our methodological proposal leverages conformal inference to address two critical challenges in single-cell RNA sequencing annotations: (i) detect out-of-distribution cell types in the query data; and, (ii) perform reliable uncertainty quantification of the cell annotations through well-calibrated prediction sets.</p><p><strong>Results: </strong>We evaluated the anomaly detector and the uncertainty-aware annotator in 10 batched experiments derived from various tissues. Specifically, we studied three different annotation taxonomies (standard, classwise, and cluster) alongside three different non-conformity measures. The results showed that our anomaly detector effectively identified previously unseen cell types, producing well-calibrated prediction sets. This rigorous annotation helped maintain coverage probabilities at the expected significance level. Finally, we illustrate how the integration of conformal prediction outputs enhanced further downstream analyses.</p><p><strong>Availability and implementation: </strong>The automatic scRNA-seq annotator is available at https://github.com/digital-medicine-research-group-UNAV/conformalized_single_cell_annotator and https://doi.org/10.5281/zenodo.15870599.</p>\",\"PeriodicalId\":93899,\"journal\":{\"name\":\"Bioinformatics (Oxford, England)\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":5.4000,\"publicationDate\":\"2025-10-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12506889/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Bioinformatics (Oxford, England)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/bioinformatics/btaf521\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics (Oxford, England)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioinformatics/btaf521","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Conformal inference for reliable single cell RNA-seq annotation.
Motivation: Despite the inherent complexity associated to automatic cell type assignments, most supervised learning models overlook rigorous uncertainty quantification on the annotations. Although some existing pipelines incorporate rejection options under predefined circumstances, they usually rely on arbitrary assumptions and do not provide statistical guarantees. In this work, we propose a methodology based on the conformal prediction framework to provide reliable single-cell annotations. Conformal prediction provides statistical guarantees on the outcome predictions without making any assumption about the underlying distribution of the data. Our methodological proposal leverages conformal inference to address two critical challenges in single-cell RNA sequencing annotations: (i) detect out-of-distribution cell types in the query data; and, (ii) perform reliable uncertainty quantification of the cell annotations through well-calibrated prediction sets.
Results: We evaluated the anomaly detector and the uncertainty-aware annotator in 10 batched experiments derived from various tissues. Specifically, we studied three different annotation taxonomies (standard, classwise, and cluster) alongside three different non-conformity measures. The results showed that our anomaly detector effectively identified previously unseen cell types, producing well-calibrated prediction sets. This rigorous annotation helped maintain coverage probabilities at the expected significance level. Finally, we illustrate how the integration of conformal prediction outputs enhanced further downstream analyses.
Availability and implementation: The automatic scRNA-seq annotator is available at https://github.com/digital-medicine-research-group-UNAV/conformalized_single_cell_annotator and https://doi.org/10.5281/zenodo.15870599.