{"title":"ROC曲线下面积(Area under ROC curve, AUC)和马修相关系数(Mathew Correlation Coefficient, MCC)在非平衡数据集上评估机器学习算法的经验比较","authors":"Chongomweru Halimu, Asem Kasem, S. Newaz","doi":"10.1145/3310986.3311023","DOIUrl":null,"url":null,"abstract":"A common challenge encountered when trying to perform classifications and comparing classifiers is selecting a suitable performance metric. This is particularly important when the data has class-imbalance problems. Area under the Receiver Operating Characteristic Curve (AUC) has been commonly used by the machine learning community in such situations, and recently researchers are starting to use Matthew Correlation Coefficient (MCC), especially in biomedical research. However, there is no empirical study that has been conducted to compare the suitability of the two metrics. In this paper, the aim of this study is to provide insights about how AUC and MCC are compared to each other when used with classical machine learning algorithms over a range of imbalanced datasets. In our study, we utilize an earlier-proposed criteria for comparing metrics based on the degree of consistency and degree of Discriminancy to compare AUC against MCC. We carry out experiments using four machine learning algorithms on 54 imbalanced datasets, with imbalance ratios ranging from 1% to 10%. The results demonstrate that both AUC and MCC are statistically consistent with each other; however AUC is more discriminating than MCC. The same observation is noticed when evaluated on 23 balanced datasets. This suggests AUC to be a better measure than MCC in evaluating and comparing classification algorithms.","PeriodicalId":252781,"journal":{"name":"Proceedings of the 3rd International Conference on Machine Learning and Soft Computing","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"65","resultStr":"{\"title\":\"Empirical Comparison of Area under ROC curve (AUC) and Mathew Correlation Coefficient (MCC) for Evaluating Machine Learning Algorithms on Imbalanced Datasets for Binary Classification\",\"authors\":\"Chongomweru Halimu, Asem Kasem, S. Newaz\",\"doi\":\"10.1145/3310986.3311023\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A common challenge encountered when trying to perform classifications and comparing classifiers is selecting a suitable performance metric. This is particularly important when the data has class-imbalance problems. Area under the Receiver Operating Characteristic Curve (AUC) has been commonly used by the machine learning community in such situations, and recently researchers are starting to use Matthew Correlation Coefficient (MCC), especially in biomedical research. However, there is no empirical study that has been conducted to compare the suitability of the two metrics. In this paper, the aim of this study is to provide insights about how AUC and MCC are compared to each other when used with classical machine learning algorithms over a range of imbalanced datasets. In our study, we utilize an earlier-proposed criteria for comparing metrics based on the degree of consistency and degree of Discriminancy to compare AUC against MCC. We carry out experiments using four machine learning algorithms on 54 imbalanced datasets, with imbalance ratios ranging from 1% to 10%. The results demonstrate that both AUC and MCC are statistically consistent with each other; however AUC is more discriminating than MCC. The same observation is noticed when evaluated on 23 balanced datasets. This suggests AUC to be a better measure than MCC in evaluating and comparing classification algorithms.\",\"PeriodicalId\":252781,\"journal\":{\"name\":\"Proceedings of the 3rd International Conference on Machine Learning and Soft Computing\",\"volume\":\"21 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-01-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"65\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 3rd International Conference on Machine Learning and Soft Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3310986.3311023\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 3rd International Conference on Machine Learning and Soft Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3310986.3311023","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Empirical Comparison of Area under ROC curve (AUC) and Mathew Correlation Coefficient (MCC) for Evaluating Machine Learning Algorithms on Imbalanced Datasets for Binary Classification
A common challenge encountered when trying to perform classifications and comparing classifiers is selecting a suitable performance metric. This is particularly important when the data has class-imbalance problems. Area under the Receiver Operating Characteristic Curve (AUC) has been commonly used by the machine learning community in such situations, and recently researchers are starting to use Matthew Correlation Coefficient (MCC), especially in biomedical research. However, there is no empirical study that has been conducted to compare the suitability of the two metrics. In this paper, the aim of this study is to provide insights about how AUC and MCC are compared to each other when used with classical machine learning algorithms over a range of imbalanced datasets. In our study, we utilize an earlier-proposed criteria for comparing metrics based on the degree of consistency and degree of Discriminancy to compare AUC against MCC. We carry out experiments using four machine learning algorithms on 54 imbalanced datasets, with imbalance ratios ranging from 1% to 10%. The results demonstrate that both AUC and MCC are statistically consistent with each other; however AUC is more discriminating than MCC. The same observation is noticed when evaluated on 23 balanced datasets. This suggests AUC to be a better measure than MCC in evaluating and comparing classification algorithms.