ROBI：稳健优化的生物标记识别器，提高发现相关放射学特征的可能性。

medRxiv - Health Informatics Pub Date : 2024-09-10 DOI:10.1101/2024.09.09.24313059

Louis Rebaud, Nicolo Capobianco, Clementine Sarkozy, Anne-Segolene Cottereau, Laetitia Vercellino, Olivier Casasnovas, Catherine Thieblemont, Bruce Spottiswoode, Irene Buvat

{"title":"ROBI：稳健优化的生物标记识别器，提高发现相关放射学特征的可能性。","authors":"Louis Rebaud, Nicolo Capobianco, Clementine Sarkozy, Anne-Segolene Cottereau, Laetitia Vercellino, Olivier Casasnovas, Catherine Thieblemont, Bruce Spottiswoode, Irene Buvat","doi":"10.1101/2024.09.09.24313059","DOIUrl":null,"url":null,"abstract":"Objectives: The Robust and Optimized Biomarker Identifier (ROBI) feature selection pipeline is introduced to improve the identification of informative biomarkers coding information not already captured by existing features. It aims to accurately maximize the number of discoveries while minimizing and estimating the number of false positives (FP) with an adjustable selection stringency.\nMethods: 500 synthetic datasets and retrospective data of 378 Diffuse Large B Cell Lymphoma (DLBCL) patients were used for validation. On the DLBCL data, two established radiomic biomarkers, TMTV and Dmax, were measured from the 18F-FDG PET/CT scans, and 10,000 random ones were generated. Selection was performed and verified on each dataset. The efficacy of ROBI has been compared to methods controlling for multiple testing and a Cox model with Elasticnet penalty.\nResults: On synthetic datasets, ROBI selected significantly more true positives (TP) than FP (p < 0.001), and for 99.3% of datasets, the number of FP was within the estimated 95% confidence interval. ROBI significantly increased the number of TP compared to usual feature selection methods (p < 0.001). On retrospective data, ROBI selected the two established biomarkers and one random biomarker and estimated 95% chance of selecting 0 or 1 FP and a probability of 0.0014 of selecting only FP. Bonferroni correction selected no feature, and Elasticnet selected 101 spurious features and discarded TMTV.\nConclusion: ROBI selected relevant biomarkers while effectively controlling for FPs, outperforming conventional selection methods. This underscores its potential as a valuable asset for biomarker discovery.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"12 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"ROBI: a Robust and Optimized Biomarker Identifier to increase the likelihood of discovering relevant radiomic features.\",\"authors\":\"Louis Rebaud, Nicolo Capobianco, Clementine Sarkozy, Anne-Segolene Cottereau, Laetitia Vercellino, Olivier Casasnovas, Catherine Thieblemont, Bruce Spottiswoode, Irene Buvat\",\"doi\":\"10.1101/2024.09.09.24313059\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Objectives: The Robust and Optimized Biomarker Identifier (ROBI) feature selection pipeline is introduced to improve the identification of informative biomarkers coding information not already captured by existing features. It aims to accurately maximize the number of discoveries while minimizing and estimating the number of false positives (FP) with an adjustable selection stringency.\\nMethods: 500 synthetic datasets and retrospective data of 378 Diffuse Large B Cell Lymphoma (DLBCL) patients were used for validation. On the DLBCL data, two established radiomic biomarkers, TMTV and Dmax, were measured from the 18F-FDG PET/CT scans, and 10,000 random ones were generated. Selection was performed and verified on each dataset. The efficacy of ROBI has been compared to methods controlling for multiple testing and a Cox model with Elasticnet penalty.\\nResults: On synthetic datasets, ROBI selected significantly more true positives (TP) than FP (p < 0.001), and for 99.3% of datasets, the number of FP was within the estimated 95% confidence interval. ROBI significantly increased the number of TP compared to usual feature selection methods (p < 0.001). On retrospective data, ROBI selected the two established biomarkers and one random biomarker and estimated 95% chance of selecting 0 or 1 FP and a probability of 0.0014 of selecting only FP. Bonferroni correction selected no feature, and Elasticnet selected 101 spurious features and discarded TMTV.\\nConclusion: ROBI selected relevant biomarkers while effectively controlling for FPs, outperforming conventional selection methods. This underscores its potential as a valuable asset for biomarker discovery.\",\"PeriodicalId\":501454,\"journal\":{\"name\":\"medRxiv - Health Informatics\",\"volume\":\"12 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"medRxiv - Health Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1101/2024.09.09.24313059\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"medRxiv - Health Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.09.09.24313059","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

目标：引入鲁棒和优化生物标记物识别器（ROBI）特征选择管道，以改进对现有特征尚未捕获的信息编码生物标记物的识别。方法：使用 500 个合成数据集和 378 名弥漫大 B 细胞淋巴瘤（DLBCL）患者的回顾性数据进行验证。在 DLBCL 数据中，通过 18F-FDG PET/CT 扫描测量了两个已确立的放射生物标志物 TMTV 和 Dmax，并随机生成了 10,000 个数据集。对每个数据集进行筛选和验证。将 ROBI 的功效与控制多重测试的方法和带有 Elasticnet 惩罚的 Cox 模型进行了比较：在合成数据集上，ROBI 选择的真阳性（TP）明显多于假阳性（FP）（p <0.001），99.3% 的数据集的假阳性数量在估计的 95% 置信区间内。与通常的特征选择方法相比，ROBI 大大增加了 TP 的数量（p < 0.001）。在回顾性数据中，ROBI 选择了两个确定的生物标志物和一个随机生物标志物，估计选择 0 或 1 个 FP 的概率为 95%，只选择 FP 的概率为 0.0014。Bonferroni校正没有选中任何特征，Elasticnet选中了101个虚假特征并丢弃了TMTV：ROBI选择了相关的生物标记物，同时有效地控制了FP，优于传统的选择方法。这凸显了其作为生物标记物发现的宝贵资产的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

ROBI: a Robust and Optimized Biomarker Identifier to increase the likelihood of discovering relevant radiomic features.

Objectives: The Robust and Optimized Biomarker Identifier (ROBI) feature selection pipeline is introduced to improve the identification of informative biomarkers coding information not already captured by existing features. It aims to accurately maximize the number of discoveries while minimizing and estimating the number of false positives (FP) with an adjustable selection stringency. Methods: 500 synthetic datasets and retrospective data of 378 Diffuse Large B Cell Lymphoma (DLBCL) patients were used for validation. On the DLBCL data, two established radiomic biomarkers, TMTV and Dmax, were measured from the 18F-FDG PET/CT scans, and 10,000 random ones were generated. Selection was performed and verified on each dataset. The efficacy of ROBI has been compared to methods controlling for multiple testing and a Cox model with Elasticnet penalty. Results: On synthetic datasets, ROBI selected significantly more true positives (TP) than FP (p < 0.001), and for 99.3% of datasets, the number of FP was within the estimated 95% confidence interval. ROBI significantly increased the number of TP compared to usual feature selection methods (p < 0.001). On retrospective data, ROBI selected the two established biomarkers and one random biomarker and estimated 95% chance of selecting 0 or 1 FP and a probability of 0.0014 of selecting only FP. Bonferroni correction selected no feature, and Elasticnet selected 101 spurious features and discarded TMTV. Conclusion: ROBI selected relevant biomarkers while effectively controlling for FPs, outperforming conventional selection methods. This underscores its potential as a valuable asset for biomarker discovery.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

medRxiv - Health Informatics

自引率

0.00%

发文量