ROC曲线下面积(Area under ROC curve, AUC)和马修相关系数(Mathew Correlation Coefficient, MCC)在非平衡数据集上评估机器学习算法的经验比较

Proceedings of the 3rd International Conference on Machine Learning and Soft Computing Pub Date : 2019-01-25 DOI:10.1145/3310986.3311023

Chongomweru Halimu, Asem Kasem, S. Newaz

{"title":"ROC曲线下面积(Area under ROC curve, AUC)和马修相关系数(Mathew Correlation Coefficient, MCC)在非平衡数据集上评估机器学习算法的经验比较","authors":"Chongomweru Halimu, Asem Kasem, S. Newaz","doi":"10.1145/3310986.3311023","DOIUrl":null,"url":null,"abstract":"A common challenge encountered when trying to perform classifications and comparing classifiers is selecting a suitable performance metric. This is particularly important when the data has class-imbalance problems. Area under the Receiver Operating Characteristic Curve (AUC) has been commonly used by the machine learning community in such situations, and recently researchers are starting to use Matthew Correlation Coefficient (MCC), especially in biomedical research. However, there is no empirical study that has been conducted to compare the suitability of the two metrics. In this paper, the aim of this study is to provide insights about how AUC and MCC are compared to each other when used with classical machine learning algorithms over a range of imbalanced datasets. In our study, we utilize an earlier-proposed criteria for comparing metrics based on the degree of consistency and degree of Discriminancy to compare AUC against MCC. We carry out experiments using four machine learning algorithms on 54 imbalanced datasets, with imbalance ratios ranging from 1% to 10%. The results demonstrate that both AUC and MCC are statistically consistent with each other; however AUC is more discriminating than MCC. The same observation is noticed when evaluated on 23 balanced datasets. This suggests AUC to be a better measure than MCC in evaluating and comparing classification algorithms.","PeriodicalId":252781,"journal":{"name":"Proceedings of the 3rd International Conference on Machine Learning and Soft Computing","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"65","resultStr":"{\"title\":\"Empirical Comparison of Area under ROC curve (AUC) and Mathew Correlation Coefficient (MCC) for Evaluating Machine Learning Algorithms on Imbalanced Datasets for Binary Classification\",\"authors\":\"Chongomweru Halimu, Asem Kasem, S. Newaz\",\"doi\":\"10.1145/3310986.3311023\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A common challenge encountered when trying to perform classifications and comparing classifiers is selecting a suitable performance metric. This is particularly important when the data has class-imbalance problems. Area under the Receiver Operating Characteristic Curve (AUC) has been commonly used by the machine learning community in such situations, and recently researchers are starting to use Matthew Correlation Coefficient (MCC), especially in biomedical research. However, there is no empirical study that has been conducted to compare the suitability of the two metrics. In this paper, the aim of this study is to provide insights about how AUC and MCC are compared to each other when used with classical machine learning algorithms over a range of imbalanced datasets. In our study, we utilize an earlier-proposed criteria for comparing metrics based on the degree of consistency and degree of Discriminancy to compare AUC against MCC. We carry out experiments using four machine learning algorithms on 54 imbalanced datasets, with imbalance ratios ranging from 1% to 10%. The results demonstrate that both AUC and MCC are statistically consistent with each other; however AUC is more discriminating than MCC. The same observation is noticed when evaluated on 23 balanced datasets. This suggests AUC to be a better measure than MCC in evaluating and comparing classification algorithms.\",\"PeriodicalId\":252781,\"journal\":{\"name\":\"Proceedings of the 3rd International Conference on Machine Learning and Soft Computing\",\"volume\":\"21 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-01-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"65\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 3rd International Conference on Machine Learning and Soft Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3310986.3311023\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 3rd International Conference on Machine Learning and Soft Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3310986.3311023","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 65

摘要

在尝试执行分类和比较分类器时遇到的一个常见挑战是选择合适的性能度量。当数据存在类不平衡问题时，这一点尤为重要。在这种情况下，机器学习社区通常使用接受者工作特征曲线下的面积(AUC)，最近研究人员开始使用马修相关系数(MCC)，特别是在生物医学研究中。然而，目前还没有实证研究来比较这两个指标的适用性。在本文中，本研究的目的是提供关于在一系列不平衡数据集上与经典机器学习算法一起使用时AUC和MCC如何相互比较的见解。在我们的研究中，我们利用先前提出的基于一致性程度和区别程度的比较指标的标准来比较AUC和MCC。我们在54个不平衡数据集上使用四种机器学习算法进行实验，不平衡率从1%到10%不等。结果表明:AUC和MCC在统计上是一致的;然而，AUC比MCC更具歧视性。当对23个平衡数据集进行评估时，也会注意到相同的观察结果。这表明在评价和比较分类算法时，AUC是比MCC更好的度量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Empirical Comparison of Area under ROC curve (AUC) and Mathew Correlation Coefficient (MCC) for Evaluating Machine Learning Algorithms on Imbalanced Datasets for Binary Classification

A common challenge encountered when trying to perform classifications and comparing classifiers is selecting a suitable performance metric. This is particularly important when the data has class-imbalance problems. Area under the Receiver Operating Characteristic Curve (AUC) has been commonly used by the machine learning community in such situations, and recently researchers are starting to use Matthew Correlation Coefficient (MCC), especially in biomedical research. However, there is no empirical study that has been conducted to compare the suitability of the two metrics. In this paper, the aim of this study is to provide insights about how AUC and MCC are compared to each other when used with classical machine learning algorithms over a range of imbalanced datasets. In our study, we utilize an earlier-proposed criteria for comparing metrics based on the degree of consistency and degree of Discriminancy to compare AUC against MCC. We carry out experiments using four machine learning algorithms on 54 imbalanced datasets, with imbalance ratios ranging from 1% to 10%. The results demonstrate that both AUC and MCC are statistically consistent with each other; however AUC is more discriminating than MCC. The same observation is noticed when evaluated on 23 balanced datasets. This suggests AUC to be a better measure than MCC in evaluating and comparing classification algorithms.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 3rd International Conference on Machine Learning and Soft Computing

自引率

0.00%

发文量