{"title":"A Novel Information Complexity Approach to Score Receiver Operating Characteristic (ROC) Curve Modeling.","authors":"Aylin Gocoglu, Neslihan Demirel, Hamparsum Bozdogan","doi":"10.3390/e26110988","DOIUrl":null,"url":null,"abstract":"<p><p>Performance metrics are measures of success or performance that can be used to evaluate how well a model makes accurate predictions or classifications. However, there is no single measure since each performance metric emphasizes a different classification aspect. Model selection procedures based on information criteria offer a quantitative measure that balances model complexity with goodness of fit, providing a better alternative to classical approaches. In this paper, we introduce and develop a novel Information Complexity-Receiver Operating Characteristic, abbreviated as ICOMP-ROC, criterion approach to fit and study the performance of ROC curve models. We construct and derive the Universal ROC (UROC) for a combination of sixteen Bi-distributional ROC models to choose the best Bi-distributional ROC by minimizing the ICOMP-ROC criterion. We conduct large-scale Monte Carlo simulations using the sixteen Bi-distributional ROC models with the Normal-Normal and Weibull-Gamma pairs as the pseudo-true ROC models. We report the frequency of hits of the ICOMP-ROC criterion, showing its remarkable recovery rate. In addition to Bi-distributional fitting, we consider a high-dimensional real Magnetic Resonance Imaging (MRI) of the Brain dataset and Wisconsin Breast Cancer (WBC) dataset to study the performance of the common performance metrics and the ICOMP-ROC criterion using several machine learning (ML) classification algorithms. We use the genetic algorithm (GA) to reduce the dimensions of these two datasets to choose the best subset of the features to study and compare the performance of the newly proposed ICOMP-ROC criterion along with the traditional performance metrics. The choice of a suitable metric is not just contingent upon the ML model used, but it also depends upon the complexity and high dimensionality of the input datasets, since the traditional performance metrics give different results and have inherent limitations. Our numerical results show the consistency and reliability of the ICOMP-ROC criterion over the traditional performance metrics as a clever model selection criterion to choose the best fitting Bi-distributional ROC model and the best classification algorithm among the ones considered. This shows the utility and the versatility of our newly proposed approach in ROC curve modeling that integrates and robustifies currently used procedures.</p>","PeriodicalId":11694,"journal":{"name":"Entropy","volume":"26 11","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2024-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11592642/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Entropy","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.3390/e26110988","RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PHYSICS, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
Performance metrics are measures of success or performance that can be used to evaluate how well a model makes accurate predictions or classifications. However, there is no single measure since each performance metric emphasizes a different classification aspect. Model selection procedures based on information criteria offer a quantitative measure that balances model complexity with goodness of fit, providing a better alternative to classical approaches. In this paper, we introduce and develop a novel Information Complexity-Receiver Operating Characteristic, abbreviated as ICOMP-ROC, criterion approach to fit and study the performance of ROC curve models. We construct and derive the Universal ROC (UROC) for a combination of sixteen Bi-distributional ROC models to choose the best Bi-distributional ROC by minimizing the ICOMP-ROC criterion. We conduct large-scale Monte Carlo simulations using the sixteen Bi-distributional ROC models with the Normal-Normal and Weibull-Gamma pairs as the pseudo-true ROC models. We report the frequency of hits of the ICOMP-ROC criterion, showing its remarkable recovery rate. In addition to Bi-distributional fitting, we consider a high-dimensional real Magnetic Resonance Imaging (MRI) of the Brain dataset and Wisconsin Breast Cancer (WBC) dataset to study the performance of the common performance metrics and the ICOMP-ROC criterion using several machine learning (ML) classification algorithms. We use the genetic algorithm (GA) to reduce the dimensions of these two datasets to choose the best subset of the features to study and compare the performance of the newly proposed ICOMP-ROC criterion along with the traditional performance metrics. The choice of a suitable metric is not just contingent upon the ML model used, but it also depends upon the complexity and high dimensionality of the input datasets, since the traditional performance metrics give different results and have inherent limitations. Our numerical results show the consistency and reliability of the ICOMP-ROC criterion over the traditional performance metrics as a clever model selection criterion to choose the best fitting Bi-distributional ROC model and the best classification algorithm among the ones considered. This shows the utility and the versatility of our newly proposed approach in ROC curve modeling that integrates and robustifies currently used procedures.
期刊介绍:
Entropy (ISSN 1099-4300), an international and interdisciplinary journal of entropy and information studies, publishes reviews, regular research papers and short notes. Our aim is to encourage scientists to publish as much as possible their theoretical and experimental details. There is no restriction on the length of the papers. If there are computation and the experiment, the details must be provided so that the results can be reproduced.