基于近等压回归模型集合的二值分类器标定。

Proceedings. IEEE International Conference on Data Mining Pub Date : 2016-12-01 DOI:10.1109/ICDM.2016.0047

Mahdi Pakdaman Naeini, Gregory F Cooper

{"title":"基于近等压回归模型集合的二值分类器标定。","authors":"Mahdi Pakdaman Naeini, Gregory F Cooper","doi":"10.1109/ICDM.2016.0047","DOIUrl":null,"url":null,"abstract":"Learning accurate probabilistic models from data is crucial in many practical tasks in data mining. In this paper we present a new non-parametric calibration method called ensemble of near isotonic regression (ENIR). The method can be considered as an extension of BBQ [20], a recently proposed calibration method, as well as the commonly used calibration method based on isotonic regression (IsoRegC) [27]. ENIR is designed to address the key limitation of IsoRegC which is the monotonicity assumption of the predictions. Similar to BBQ, the method post-processes the output of a binary classifier to obtain calibrated probabilities. Thus it can be used with many existing classification models to generate accurate probabilistic predictions. We demonstrate the performance of ENIR on synthetic and real datasets for commonly applied binary classification models. Experimental results show that the method outperforms several common binary classifier calibration methods. In particular on the real data, ENIR commonly performs statistically significantly better than the other methods, and never worse. It is able to improve the calibration power of classifiers, while retaining their discrimination power. The method is also computationally tractable for large scale datasets, as it is O(N log N) time, where N is the number of samples.","PeriodicalId":74565,"journal":{"name":"Proceedings. IEEE International Conference on Data Mining","volume":"2016 ","pages":"360-369"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/ICDM.2016.0047","citationCount":"38","resultStr":"{\"title\":\"Binary Classifier Calibration using an Ensemble of Near Isotonic Regression Models.\",\"authors\":\"Mahdi Pakdaman Naeini, Gregory F Cooper\",\"doi\":\"10.1109/ICDM.2016.0047\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Learning accurate probabilistic models from data is crucial in many practical tasks in data mining. In this paper we present a new non-parametric calibration method called ensemble of near isotonic regression (ENIR). The method can be considered as an extension of BBQ [20], a recently proposed calibration method, as well as the commonly used calibration method based on isotonic regression (IsoRegC) [27]. ENIR is designed to address the key limitation of IsoRegC which is the monotonicity assumption of the predictions. Similar to BBQ, the method post-processes the output of a binary classifier to obtain calibrated probabilities. Thus it can be used with many existing classification models to generate accurate probabilistic predictions. We demonstrate the performance of ENIR on synthetic and real datasets for commonly applied binary classification models. Experimental results show that the method outperforms several common binary classifier calibration methods. In particular on the real data, ENIR commonly performs statistically significantly better than the other methods, and never worse. It is able to improve the calibration power of classifiers, while retaining their discrimination power. The method is also computationally tractable for large scale datasets, as it is O(N log N) time, where N is the number of samples.\",\"PeriodicalId\":74565,\"journal\":{\"name\":\"Proceedings. IEEE International Conference on Data Mining\",\"volume\":\"2016 \",\"pages\":\"360-369\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1109/ICDM.2016.0047\",\"citationCount\":\"38\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. IEEE International Conference on Data Mining\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDM.2016.0047\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. IEEE International Conference on Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDM.2016.0047","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 38

摘要

从数据中学习准确的概率模型在数据挖掘的许多实际任务中是至关重要的。本文提出了一种新的非参数定标方法——近等压回归系综(ENIR)。该方法可以看作是最近提出的一种校准方法BBQ[20]的扩展，以及常用的基于等压回归的校准方法(IsoRegC)[27]。ENIR旨在解决IsoRegC的关键限制，即预测的单调性假设。与BBQ类似，该方法对二值分类器的输出进行后处理以获得校准概率。因此，它可以与许多现有的分类模型一起使用，以产生准确的概率预测。对于常用的二分类模型，我们展示了ENIR在合成数据集和真实数据集上的性能。实验结果表明，该方法优于几种常用的二值分类器标定方法。特别是在实际数据上，ENIR通常比其他方法在统计上表现得更好，而且不会更差。在保持分类器识别能力的同时，提高了分类器的校准能力。对于大规模数据集，该方法在计算上也易于处理，因为它是O(N log N)时间，其中N是样本数量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Binary Classifier Calibration using an Ensemble of Near Isotonic Regression Models.

查看原文本刊更多论文

Binary Classifier Calibration using an Ensemble of Near Isotonic Regression Models.

Learning accurate probabilistic models from data is crucial in many practical tasks in data mining. In this paper we present a new non-parametric calibration method called ensemble of near isotonic regression (ENIR). The method can be considered as an extension of BBQ [20], a recently proposed calibration method, as well as the commonly used calibration method based on isotonic regression (IsoRegC) [27]. ENIR is designed to address the key limitation of IsoRegC which is the monotonicity assumption of the predictions. Similar to BBQ, the method post-processes the output of a binary classifier to obtain calibrated probabilities. Thus it can be used with many existing classification models to generate accurate probabilistic predictions. We demonstrate the performance of ENIR on synthetic and real datasets for commonly applied binary classification models. Experimental results show that the method outperforms several common binary classifier calibration methods. In particular on the real data, ENIR commonly performs statistically significantly better than the other methods, and never worse. It is able to improve the calibration power of classifiers, while retaining their discrimination power. The method is also computationally tractable for large scale datasets, as it is O(N log N) time, where N is the number of samples.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings. IEEE International Conference on Data Mining

自引率

0.00%

发文量