Hongxu Zhu, Amir Asiaee, Leila Azinfar, Jun Li, Han Liang, Ehsan Irajizad, Kim-Anh Do, James P Long
{"title":"AUPRC:用于评估在识别差异表达基因的硅微扰方法的性能的度量。","authors":"Hongxu Zhu, Amir Asiaee, Leila Azinfar, Jun Li, Han Liang, Ehsan Irajizad, Kim-Anh Do, James P Long","doi":"10.1093/bib/bbaf426","DOIUrl":null,"url":null,"abstract":"<p><p>In silico perturbation models, computational methods that can predict cellular responses to perturbations, present an opportunity to reduce the need for costly and time-intensive in vitro experiments. Many recently proposed models predict high-dimensional cellular responses, such as gene or protein expression to perturbations such as gene knockout or drugs. However, evaluating in silico performance has largely relied on metrics such as $R^{2}$, which assess overall prediction accuracy but fail to capture biologically significant outcomes like the identification of differentially expressed (DE) genes. In this study, we present a novel evaluation framework that introduces the AUPRC metric to assess the precision and recall of DE gene predictions. By applying this framework to both single-cell and pseudo-bulked datasets, we systematically benchmark simple and advanced computational models. Our results highlight a significant discrepancy between $R^{2}$ and AUPRC, with models achieving high $R^{2}$ values but struggling to identify DE genes, as reflected in their low AUPRC values. This finding underscores the limitations of traditional evaluation metrics and the importance of biologically relevant assessments. Our framework provides a more comprehensive understanding of model capabilities, advancing the application of computational approaches in cellular perturbation research.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 5","pages":""},"PeriodicalIF":7.7000,"publicationDate":"2025-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12400816/pdf/","citationCount":"0","resultStr":"{\"title\":\"AUPRC: a metric for evaluating the performance of in-silico perturbation methods in identifying differentially expressed genes.\",\"authors\":\"Hongxu Zhu, Amir Asiaee, Leila Azinfar, Jun Li, Han Liang, Ehsan Irajizad, Kim-Anh Do, James P Long\",\"doi\":\"10.1093/bib/bbaf426\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>In silico perturbation models, computational methods that can predict cellular responses to perturbations, present an opportunity to reduce the need for costly and time-intensive in vitro experiments. Many recently proposed models predict high-dimensional cellular responses, such as gene or protein expression to perturbations such as gene knockout or drugs. However, evaluating in silico performance has largely relied on metrics such as $R^{2}$, which assess overall prediction accuracy but fail to capture biologically significant outcomes like the identification of differentially expressed (DE) genes. In this study, we present a novel evaluation framework that introduces the AUPRC metric to assess the precision and recall of DE gene predictions. By applying this framework to both single-cell and pseudo-bulked datasets, we systematically benchmark simple and advanced computational models. Our results highlight a significant discrepancy between $R^{2}$ and AUPRC, with models achieving high $R^{2}$ values but struggling to identify DE genes, as reflected in their low AUPRC values. This finding underscores the limitations of traditional evaluation metrics and the importance of biologically relevant assessments. Our framework provides a more comprehensive understanding of model capabilities, advancing the application of computational approaches in cellular perturbation research.</p>\",\"PeriodicalId\":9209,\"journal\":{\"name\":\"Briefings in bioinformatics\",\"volume\":\"26 5\",\"pages\":\"\"},\"PeriodicalIF\":7.7000,\"publicationDate\":\"2025-08-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12400816/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Briefings in bioinformatics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1093/bib/bbaf426\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Briefings in bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/bib/bbaf426","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
AUPRC: a metric for evaluating the performance of in-silico perturbation methods in identifying differentially expressed genes.
In silico perturbation models, computational methods that can predict cellular responses to perturbations, present an opportunity to reduce the need for costly and time-intensive in vitro experiments. Many recently proposed models predict high-dimensional cellular responses, such as gene or protein expression to perturbations such as gene knockout or drugs. However, evaluating in silico performance has largely relied on metrics such as $R^{2}$, which assess overall prediction accuracy but fail to capture biologically significant outcomes like the identification of differentially expressed (DE) genes. In this study, we present a novel evaluation framework that introduces the AUPRC metric to assess the precision and recall of DE gene predictions. By applying this framework to both single-cell and pseudo-bulked datasets, we systematically benchmark simple and advanced computational models. Our results highlight a significant discrepancy between $R^{2}$ and AUPRC, with models achieving high $R^{2}$ values but struggling to identify DE genes, as reflected in their low AUPRC values. This finding underscores the limitations of traditional evaluation metrics and the importance of biologically relevant assessments. Our framework provides a more comprehensive understanding of model capabilities, advancing the application of computational approaches in cellular perturbation research.
期刊介绍:
Briefings in Bioinformatics is an international journal serving as a platform for researchers and educators in the life sciences. It also appeals to mathematicians, statisticians, and computer scientists applying their expertise to biological challenges. The journal focuses on reviews tailored for users of databases and analytical tools in contemporary genetics, molecular and systems biology. It stands out by offering practical assistance and guidance to non-specialists in computerized methodologies. Covering a wide range from introductory concepts to specific protocols and analyses, the papers address bacterial, plant, fungal, animal, and human data.
The journal's detailed subject areas include genetic studies of phenotypes and genotypes, mapping, DNA sequencing, expression profiling, gene expression studies, microarrays, alignment methods, protein profiles and HMMs, lipids, metabolic and signaling pathways, structure determination and function prediction, phylogenetic studies, and education and training.