A Novel Information Complexity Approach to Score Receiver Operating Characteristic (ROC) Curve Modeling.

IF 2.1 3区物理与天体物理 Q2 PHYSICS, MULTIDISCIPLINARY

Entropy Pub Date : 2024-11-17 DOI:10.3390/e26110988

Aylin Gocoglu, Neslihan Demirel, Hamparsum Bozdogan

{"title":"A Novel Information Complexity Approach to Score Receiver Operating Characteristic (ROC) Curve Modeling.","authors":"Aylin Gocoglu, Neslihan Demirel, Hamparsum Bozdogan","doi":"10.3390/e26110988","DOIUrl":null,"url":null,"abstract":"<p><p>Performance metrics are measures of success or performance that can be used to evaluate how well a model makes accurate predictions or classifications. However, there is no single measure since each performance metric emphasizes a different classification aspect. Model selection procedures based on information criteria offer a quantitative measure that balances model complexity with goodness of fit, providing a better alternative to classical approaches. In this paper, we introduce and develop a novel Information Complexity-Receiver Operating Characteristic, abbreviated as ICOMP-ROC, criterion approach to fit and study the performance of ROC curve models. We construct and derive the Universal ROC (UROC) for a combination of sixteen Bi-distributional ROC models to choose the best Bi-distributional ROC by minimizing the ICOMP-ROC criterion. We conduct large-scale Monte Carlo simulations using the sixteen Bi-distributional ROC models with the Normal-Normal and Weibull-Gamma pairs as the pseudo-true ROC models. We report the frequency of hits of the ICOMP-ROC criterion, showing its remarkable recovery rate. In addition to Bi-distributional fitting, we consider a high-dimensional real Magnetic Resonance Imaging (MRI) of the Brain dataset and Wisconsin Breast Cancer (WBC) dataset to study the performance of the common performance metrics and the ICOMP-ROC criterion using several machine learning (ML) classification algorithms. We use the genetic algorithm (GA) to reduce the dimensions of these two datasets to choose the best subset of the features to study and compare the performance of the newly proposed ICOMP-ROC criterion along with the traditional performance metrics. The choice of a suitable metric is not just contingent upon the ML model used, but it also depends upon the complexity and high dimensionality of the input datasets, since the traditional performance metrics give different results and have inherent limitations. Our numerical results show the consistency and reliability of the ICOMP-ROC criterion over the traditional performance metrics as a clever model selection criterion to choose the best fitting Bi-distributional ROC model and the best classification algorithm among the ones considered. This shows the utility and the versatility of our newly proposed approach in ROC curve modeling that integrates and robustifies currently used procedures.</p>","PeriodicalId":11694,"journal":{"name":"Entropy","volume":"26 11","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2024-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11592642/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Entropy","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.3390/e26110988","RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PHYSICS, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

Performance metrics are measures of success or performance that can be used to evaluate how well a model makes accurate predictions or classifications. However, there is no single measure since each performance metric emphasizes a different classification aspect. Model selection procedures based on information criteria offer a quantitative measure that balances model complexity with goodness of fit, providing a better alternative to classical approaches. In this paper, we introduce and develop a novel Information Complexity-Receiver Operating Characteristic, abbreviated as ICOMP-ROC, criterion approach to fit and study the performance of ROC curve models. We construct and derive the Universal ROC (UROC) for a combination of sixteen Bi-distributional ROC models to choose the best Bi-distributional ROC by minimizing the ICOMP-ROC criterion. We conduct large-scale Monte Carlo simulations using the sixteen Bi-distributional ROC models with the Normal-Normal and Weibull-Gamma pairs as the pseudo-true ROC models. We report the frequency of hits of the ICOMP-ROC criterion, showing its remarkable recovery rate. In addition to Bi-distributional fitting, we consider a high-dimensional real Magnetic Resonance Imaging (MRI) of the Brain dataset and Wisconsin Breast Cancer (WBC) dataset to study the performance of the common performance metrics and the ICOMP-ROC criterion using several machine learning (ML) classification algorithms. We use the genetic algorithm (GA) to reduce the dimensions of these two datasets to choose the best subset of the features to study and compare the performance of the newly proposed ICOMP-ROC criterion along with the traditional performance metrics. The choice of a suitable metric is not just contingent upon the ML model used, but it also depends upon the complexity and high dimensionality of the input datasets, since the traditional performance metrics give different results and have inherent limitations. Our numerical results show the consistency and reliability of the ICOMP-ROC criterion over the traditional performance metrics as a clever model selection criterion to choose the best fitting Bi-distributional ROC model and the best classification algorithm among the ones considered. This shows the utility and the versatility of our newly proposed approach in ROC curve modeling that integrates and robustifies currently used procedures.

查看原文本刊更多论文

一种新颖的信息复杂性方法来建立评分接收者操作特征曲线 (ROC) 模型。

性能指标是衡量成功或性能的标准，可用来评估模型预测或分类的准确程度。然而，由于每个性能指标都强调不同的分类方面，因此没有单一的衡量标准。基于信息标准的模型选择程序提供了一种定量测量方法，可在模型复杂性与拟合度之间取得平衡，从而为传统方法提供了更好的替代方案。在本文中，我们介绍并开发了一种新颖的信息完备性-接收器运行特征（缩写为 ICOMP-ROC）准则方法，用于拟合和研究 ROC 曲线模型的性能。我们构建并推导出 16 个双分布 ROC 模型组合的通用 ROC (UROC)，通过最小化 ICOMP-ROC 准则来选择最佳的双分布 ROC。我们使用 16 个 Bi-distributional ROC 模型进行了大规模蒙特卡罗模拟，以 Normal-Normal 和 Weibull-Gamma 对作为伪真实 ROC 模型。我们报告了 ICOMP-ROC 准则的命中频率，显示了其显著的恢复率。除了双分布拟合外，我们还考虑了高维真实脑磁共振成像（MRI）数据集和威斯康星乳腺癌（WBC）数据集，使用几种机器学习（ML）分类算法来研究常见性能指标和 ICOMP-ROC 准则的性能。我们使用遗传算法（GA）来降低这两个数据集的维度，以选择最佳特征子集来研究和比较新提出的 ICOMP-ROC 标准与传统性能指标的性能。选择合适的指标不仅取决于所使用的 ML 模型，还取决于输入数据集的复杂性和高维性，因为传统的性能指标会给出不同的结果，并且具有固有的局限性。我们的数值结果表明，与传统性能指标相比，ICOMP-ROC 标准作为一种巧妙的模型选择标准，具有一致性和可靠性，可以在所考虑的模型中选择最佳拟合双分布 ROC 模型和最佳分类算法。这显示了我们新提出的 ROC 曲线建模方法的实用性和多功能性，它整合并强化了当前使用的程序。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Entropy PHYSICS, MULTIDISCIPLINARY-

CiteScore

4.90

自引率

11.10%

发文量

1580

审稿时长

21.05 days

期刊介绍： Entropy (ISSN 1099-4300), an international and interdisciplinary journal of entropy and information studies, publishes reviews, regular research papers and short notes. Our aim is to encourage scientists to publish as much as possible their theoretical and experimental details. There is no restriction on the length of the papers. If there are computation and the experiment, the details must be provided so that the results can be reproduced.