关于生成所有最优单调分类的问题

2011 IEEE 11th International Conference on Data Mining Pub Date : 2011-12-11 DOI:10.1109/ICDM.2011.111

Luite Stegeman, A. Feelders

{"title":"关于生成所有最优单调分类的问题","authors":"Luite Stegeman, A. Feelders","doi":"10.1109/ICDM.2011.111","DOIUrl":null,"url":null,"abstract":"In many applications of data mining one knows beforehand that the response variable should be monotone (either increasing or decreasing) in the attributes. In ordinal classification, changing the class labels of a data set (relabeling) so that the data becomes monotone, is useful for at least two reasons. Firstly, models trained on relabeled data tend to have better predictive performance than models trained on the original data. Secondly, relabeling is an important building block for the construction of monotone classifiers. However, optimal monotone relabelings are rarely unique, and so far an efficient algorithm to generate them all has been lacking. The main result of this paper is an efficient algorithm to produce the structure of all optimal monotone relabelings. We also show that counting the solutions is #P-complete and give algorithms for efficiently enumerating all solutions, as well as sampling uniformly from the set of solutions. Experiments show that relabeling non-monotone data can improve the predictive performance of models trained on that data.","PeriodicalId":106216,"journal":{"name":"2011 IEEE 11th International Conference on Data Mining","volume":"101 9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"On Generating All Optimal Monotone Classifications\",\"authors\":\"Luite Stegeman, A. Feelders\",\"doi\":\"10.1109/ICDM.2011.111\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In many applications of data mining one knows beforehand that the response variable should be monotone (either increasing or decreasing) in the attributes. In ordinal classification, changing the class labels of a data set (relabeling) so that the data becomes monotone, is useful for at least two reasons. Firstly, models trained on relabeled data tend to have better predictive performance than models trained on the original data. Secondly, relabeling is an important building block for the construction of monotone classifiers. However, optimal monotone relabelings are rarely unique, and so far an efficient algorithm to generate them all has been lacking. The main result of this paper is an efficient algorithm to produce the structure of all optimal monotone relabelings. We also show that counting the solutions is #P-complete and give algorithms for efficiently enumerating all solutions, as well as sampling uniformly from the set of solutions. Experiments show that relabeling non-monotone data can improve the predictive performance of models trained on that data.\",\"PeriodicalId\":106216,\"journal\":{\"name\":\"2011 IEEE 11th International Conference on Data Mining\",\"volume\":\"101 9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-12-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 IEEE 11th International Conference on Data Mining\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDM.2011.111\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE 11th International Conference on Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDM.2011.111","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

在数据挖掘的许多应用中，人们事先知道属性中的响应变量应该是单调的(增加或减少)。在有序分类中，更改数据集的类标签(重新标记)使数据变得单调，至少有两个原因是有用的。首先，在重新标记数据上训练的模型往往比在原始数据上训练的模型具有更好的预测性能。其次，重新标注是构建单调分类器的重要组成部分。然而，最优单调重标记很少是唯一的，到目前为止，还缺乏一种有效的算法来生成它们。本文的主要成果是一种产生所有最优单调重标记结构的有效算法。我们还证明了计数解是# p -完全的，并给出了有效枚举所有解的算法，以及从解集中均匀抽样的算法。实验表明，对非单调数据进行重新标记可以提高基于该数据训练的模型的预测性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

On Generating All Optimal Monotone Classifications

In many applications of data mining one knows beforehand that the response variable should be monotone (either increasing or decreasing) in the attributes. In ordinal classification, changing the class labels of a data set (relabeling) so that the data becomes monotone, is useful for at least two reasons. Firstly, models trained on relabeled data tend to have better predictive performance than models trained on the original data. Secondly, relabeling is an important building block for the construction of monotone classifiers. However, optimal monotone relabelings are rarely unique, and so far an efficient algorithm to generate them all has been lacking. The main result of this paper is an efficient algorithm to produce the structure of all optimal monotone relabelings. We also show that counting the solutions is #P-complete and give algorithms for efficiently enumerating all solutions, as well as sampling uniformly from the set of solutions. Experiments show that relabeling non-monotone data can improve the predictive performance of models trained on that data.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2011 IEEE 11th International Conference on Data Mining

自引率

0.00%

发文量