Louis Rebaud, Nicolo Capobianco, Clementine Sarkozy, Anne-Segolene Cottereau, Laetitia Vercellino, Olivier Casasnovas, Catherine Thieblemont, Bruce Spottiswoode, Irene Buvat
{"title":"ROBI: a Robust and Optimized Biomarker Identifier to increase the likelihood of discovering relevant radiomic features.","authors":"Louis Rebaud, Nicolo Capobianco, Clementine Sarkozy, Anne-Segolene Cottereau, Laetitia Vercellino, Olivier Casasnovas, Catherine Thieblemont, Bruce Spottiswoode, Irene Buvat","doi":"10.1101/2024.09.09.24313059","DOIUrl":null,"url":null,"abstract":"Objectives: The Robust and Optimized Biomarker Identifier (ROBI) feature selection pipeline is introduced to improve the identification of informative biomarkers coding information not already captured by existing features. It aims to accurately maximize the number of discoveries while minimizing and estimating the number of false positives (FP) with an adjustable selection stringency.\nMethods: 500 synthetic datasets and retrospective data of 378 Diffuse Large B Cell Lymphoma (DLBCL) patients were used for validation. On the DLBCL data, two established radiomic biomarkers, TMTV and Dmax, were measured from the 18F-FDG PET/CT scans, and 10,000 random ones were generated. Selection was performed and verified on each dataset. The efficacy of ROBI has been compared to methods controlling for multiple testing and a Cox model with Elasticnet penalty.\nResults: On synthetic datasets, ROBI selected significantly more true positives (TP) than FP (p < 0.001), and for 99.3% of datasets, the number of FP was within the estimated 95% confidence interval. ROBI significantly increased the number of TP compared to usual feature selection methods (p < 0.001). On retrospective data, ROBI selected the two established biomarkers and one random biomarker and estimated 95% chance of selecting 0 or 1 FP and a probability of 0.0014 of selecting only FP. Bonferroni correction selected no feature, and Elasticnet selected 101 spurious features and discarded TMTV.\nConclusion: ROBI selected relevant biomarkers while effectively controlling for FPs, outperforming conventional selection methods. This underscores its potential as a valuable asset for biomarker discovery.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"12 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"medRxiv - Health Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.09.09.24313059","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Objectives: The Robust and Optimized Biomarker Identifier (ROBI) feature selection pipeline is introduced to improve the identification of informative biomarkers coding information not already captured by existing features. It aims to accurately maximize the number of discoveries while minimizing and estimating the number of false positives (FP) with an adjustable selection stringency.
Methods: 500 synthetic datasets and retrospective data of 378 Diffuse Large B Cell Lymphoma (DLBCL) patients were used for validation. On the DLBCL data, two established radiomic biomarkers, TMTV and Dmax, were measured from the 18F-FDG PET/CT scans, and 10,000 random ones were generated. Selection was performed and verified on each dataset. The efficacy of ROBI has been compared to methods controlling for multiple testing and a Cox model with Elasticnet penalty.
Results: On synthetic datasets, ROBI selected significantly more true positives (TP) than FP (p < 0.001), and for 99.3% of datasets, the number of FP was within the estimated 95% confidence interval. ROBI significantly increased the number of TP compared to usual feature selection methods (p < 0.001). On retrospective data, ROBI selected the two established biomarkers and one random biomarker and estimated 95% chance of selecting 0 or 1 FP and a probability of 0.0014 of selecting only FP. Bonferroni correction selected no feature, and Elasticnet selected 101 spurious features and discarded TMTV.
Conclusion: ROBI selected relevant biomarkers while effectively controlling for FPs, outperforming conventional selection methods. This underscores its potential as a valuable asset for biomarker discovery.