{"title":"Analysis of classification metric behaviour under class imbalance","authors":"Jean-Pierre van Zyl , Andries Petrus Engelbrecht","doi":"10.1016/j.eij.2025.100711","DOIUrl":null,"url":null,"abstract":"<div><div>Class imbalance is the phenomenon defined as skewed target variable distributions in a dataset. In other words class imbalance occurs when a dataset has an unequal proportion of target variables assigned to the instances in the dataset. Although the level of class imbalance is simply an inherent property of a dataset, highly skewed class imbalances cause misleading performance evaluations of a classification model to be reported by certain evaluation metrics. This paper reviews the history of existing performance evaluation metrics for classification, and uses a normalisation process to create new variations of these existing metrics which are more robust to class imbalance. Conclusions about the performance of the analysed metrics are drawn by performing the first extensive global sensitivity analysis of classification metrics. A statistical analysis technique, <em>i.e.</em> analysis of variance, is used to analyse the robustness to class imbalance of the existing metrics and the proposed metrics. This paper finds that most performance evaluation metrics for classification problems are highly sensitive to class imbalance, while the newly proposed alternative metrics tend to be more robust to class imbalance.</div></div>","PeriodicalId":56010,"journal":{"name":"Egyptian Informatics Journal","volume":"31 ","pages":"Article 100711"},"PeriodicalIF":4.3000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Egyptian Informatics Journal","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1110866525001045","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Class imbalance is the phenomenon defined as skewed target variable distributions in a dataset. In other words class imbalance occurs when a dataset has an unequal proportion of target variables assigned to the instances in the dataset. Although the level of class imbalance is simply an inherent property of a dataset, highly skewed class imbalances cause misleading performance evaluations of a classification model to be reported by certain evaluation metrics. This paper reviews the history of existing performance evaluation metrics for classification, and uses a normalisation process to create new variations of these existing metrics which are more robust to class imbalance. Conclusions about the performance of the analysed metrics are drawn by performing the first extensive global sensitivity analysis of classification metrics. A statistical analysis technique, i.e. analysis of variance, is used to analyse the robustness to class imbalance of the existing metrics and the proposed metrics. This paper finds that most performance evaluation metrics for classification problems are highly sensitive to class imbalance, while the newly proposed alternative metrics tend to be more robust to class imbalance.
期刊介绍:
The Egyptian Informatics Journal is published by the Faculty of Computers and Artificial Intelligence, Cairo University. This Journal provides a forum for the state-of-the-art research and development in the fields of computing, including computer sciences, information technologies, information systems, operations research and decision support. Innovative and not-previously-published work in subjects covered by the Journal is encouraged to be submitted, whether from academic, research or commercial sources.