用保形预测减少白血病机器学习分类的误差。

IF 2.8 Q2 ONCOLOGY

JCO Clinical Cancer Informatics Pub Date : 2025-05-01 Epub Date: 2025-05-28 DOI:10.1200/CCI-24-00324

Mariya Lysenkova Wiklander, Dave Zachariah, Olga Krali, Jessica Nordlund

{"title":"用保形预测减少白血病机器学习分类的误差。","authors":"Mariya Lysenkova Wiklander, Dave Zachariah, Olga Krali, Jessica Nordlund","doi":"10.1200/CCI-24-00324","DOIUrl":null,"url":null,"abstract":"Purpose: Recent advances in machine learning have led to the development of classifiers that predict molecular subtypes of acute lymphoblastic leukemia (ALL) using RNA-sequencing (RNA-seq) data. Although these models have shown promising results, they often lack robust performance guarantees. The aim of this study was three-fold: to quantify the uncertainty of these classifiers, to provide prediction sets that control the false-negative rate (FNR), and to perform implicit error reduction by transforming incorrect predictions into uncertain predictions.Methods: Conformal prediction (CP) is a distribution-agnostic framework for generating statistically calibrated prediction sets whose size reflects model uncertainty. In this study, we applied an extension called conformal risk control to three RNA-seq ALL subtype classifiers. Leveraging RNA-seq data from 1,227 patient samples taken at diagnosis, we developed a multiclass conformal predictor ALLCoP, which generates statistically guaranteed FNR-controlled prediction sets.Results: ALLCoP was able to create prediction sets with specified FNR tolerances ranging from 7.5% to 30%. In a validation cohort, ALLCoP successfully reduced the FNR of the ALLIUM RNA-seq ALL subtype classifier from 8.95% to 3.5%. For patients whose subtype was not previously known, the use of ALLCoP was able to reduce the occurrence of empty predictions from 37% to 17%. Notably, up to 34% of the multiple-class prediction sets included the PAX5alt subtype, suggesting that increased prediction set size may reflect secondary aberrations and biological complexity, contributing to classifier uncertainty. Finally, ALLCoP was validated on two additional RNA-seq ALL subtype classifiers, ALLSorts and ALLCatchR.Conclusion: Our results highlight the potential of CP in enhancing the use of oncologic RNA-seq subtyping classifiers and also in uncovering additional molecular aberrations of potential clinical importance.","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2400324"},"PeriodicalIF":2.8000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12133051/pdf/","citationCount":"0","resultStr":"{\"title\":\"Error Reduction in Leukemia Machine Learning Classification With Conformal Prediction.\",\"authors\":\"Mariya Lysenkova Wiklander, Dave Zachariah, Olga Krali, Jessica Nordlund\",\"doi\":\"10.1200/CCI-24-00324\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Purpose: Recent advances in machine learning have led to the development of classifiers that predict molecular subtypes of acute lymphoblastic leukemia (ALL) using RNA-sequencing (RNA-seq) data. Although these models have shown promising results, they often lack robust performance guarantees. The aim of this study was three-fold: to quantify the uncertainty of these classifiers, to provide prediction sets that control the false-negative rate (FNR), and to perform implicit error reduction by transforming incorrect predictions into uncertain predictions.Methods: Conformal prediction (CP) is a distribution-agnostic framework for generating statistically calibrated prediction sets whose size reflects model uncertainty. In this study, we applied an extension called conformal risk control to three RNA-seq ALL subtype classifiers. Leveraging RNA-seq data from 1,227 patient samples taken at diagnosis, we developed a multiclass conformal predictor ALLCoP, which generates statistically guaranteed FNR-controlled prediction sets.Results: ALLCoP was able to create prediction sets with specified FNR tolerances ranging from 7.5% to 30%. In a validation cohort, ALLCoP successfully reduced the FNR of the ALLIUM RNA-seq ALL subtype classifier from 8.95% to 3.5%. For patients whose subtype was not previously known, the use of ALLCoP was able to reduce the occurrence of empty predictions from 37% to 17%. Notably, up to 34% of the multiple-class prediction sets included the PAX5alt subtype, suggesting that increased prediction set size may reflect secondary aberrations and biological complexity, contributing to classifier uncertainty. Finally, ALLCoP was validated on two additional RNA-seq ALL subtype classifiers, ALLSorts and ALLCatchR.Conclusion: Our results highlight the potential of CP in enhancing the use of oncologic RNA-seq subtyping classifiers and also in uncovering additional molecular aberrations of potential clinical importance.\",\"PeriodicalId\":51626,\"journal\":{\"name\":\"JCO Clinical Cancer Informatics\",\"volume\":\"9 \",\"pages\":\"e2400324\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2025-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12133051/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JCO Clinical Cancer Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1200/CCI-24-00324\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/5/28 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q2\",\"JCRName\":\"ONCOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JCO Clinical Cancer Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1200/CCI-24-00324","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/5/28 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"ONCOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

目的：机器学习的最新进展导致了分类器的发展，这些分类器使用rna测序（RNA-seq）数据预测急性淋巴细胞白血病（ALL）的分子亚型。尽管这些模型已经显示出有希望的结果，但它们往往缺乏健壮的性能保证。本研究的目的有三个方面：量化这些分类器的不确定性，提供控制假阴性率（FNR）的预测集，并通过将不正确的预测转换为不确定的预测来执行隐含的错误减少。方法：共形预测（CP）是一种分布不可知的框架，用于生成统计校准的预测集，其大小反映模型的不确定性。在这项研究中，我们将一种称为适形风险控制的扩展方法应用于三个RNA-seq ALL亚型分类器。利用1227例诊断时采集的患者样本的RNA-seq数据，我们开发了一个多类别的适形预测器ALLCoP，它产生了统计上保证的fnn控制预测集。结果：ALLCoP能够创建具有指定FNR公差范围为7.5%至30%的预测集。在一个验证队列中，ALLCoP成功地将ALLIUM RNA-seq ALL亚型分类器的FNR从8.95%降低到3.5%。对于以前不知道亚型的患者，使用ALLCoP能够将空预测的发生率从37%降低到17%。值得注意的是，多达34%的多类预测集包括PAX5alt亚型，这表明预测集大小的增加可能反映了次生像差和生物复杂性，从而导致分类器的不确定性。最后，ALLCoP在另外两个RNA-seq ALL亚型分类器ALLSorts和ALLCatchR上进行验证。结论：我们的研究结果强调了CP在加强肿瘤RNA-seq亚型分类器的使用以及发现潜在临床重要性的其他分子畸变方面的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Error Reduction in Leukemia Machine Learning Classification With Conformal Prediction.

Purpose: Recent advances in machine learning have led to the development of classifiers that predict molecular subtypes of acute lymphoblastic leukemia (ALL) using RNA-sequencing (RNA-seq) data. Although these models have shown promising results, they often lack robust performance guarantees. The aim of this study was three-fold: to quantify the uncertainty of these classifiers, to provide prediction sets that control the false-negative rate (FNR), and to perform implicit error reduction by transforming incorrect predictions into uncertain predictions.

Methods: Conformal prediction (CP) is a distribution-agnostic framework for generating statistically calibrated prediction sets whose size reflects model uncertainty. In this study, we applied an extension called conformal risk control to three RNA-seq ALL subtype classifiers. Leveraging RNA-seq data from 1,227 patient samples taken at diagnosis, we developed a multiclass conformal predictor ALLCoP, which generates statistically guaranteed FNR-controlled prediction sets.

Results: ALLCoP was able to create prediction sets with specified FNR tolerances ranging from 7.5% to 30%. In a validation cohort, ALLCoP successfully reduced the FNR of the ALLIUM RNA-seq ALL subtype classifier from 8.95% to 3.5%. For patients whose subtype was not previously known, the use of ALLCoP was able to reduce the occurrence of empty predictions from 37% to 17%. Notably, up to 34% of the multiple-class prediction sets included the PAX5alt subtype, suggesting that increased prediction set size may reflect secondary aberrations and biological complexity, contributing to classifier uncertainty. Finally, ALLCoP was validated on two additional RNA-seq ALL subtype classifiers, ALLSorts and ALLCatchR.

Conclusion: Our results highlight the potential of CP in enhancing the use of oncologic RNA-seq subtyping classifiers and also in uncovering additional molecular aberrations of potential clinical importance.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

JCO Clinical Cancer Informatics ONCOLOGY-

CiteScore

6.20

自引率

4.80%

发文量

190