{"title":"利用显著、正相关和相对类相关规则对不平衡数据集进行关联分类","authors":"Florian Verhein, S. Chawla","doi":"10.1109/ICDM.2007.63","DOIUrl":null,"url":null,"abstract":"The application of association rule mining to classification has led to a new family of classifiers which are often referred to as \"associative classifiers (ACs)\". An advantage of ACs is that they are rule-based and thus lend themselves to an easier interpretation. Rule-based classifiers can play a very important role in applications such as medical diagnosis and fraud detection where \"imbalanced data sets\" are the norm and not the exception. The focus of this paper is to extend and modify ACs for classification on imbalanced data sets using only statistical techniques. We combine the use of statistically significant rules with a new measure, the Class Correlation Ratio (CCR), to build an AC which we call SPARCCC. Experiments show that in terms of classification quality, SPARCCC performs comparably on balanced datasets and outperforms other AC techniques on imbalanced data sets. It also has a significantly smaller rule base and is much more computationally efficient.","PeriodicalId":233758,"journal":{"name":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"50","resultStr":"{\"title\":\"Using Significant, Positively Associated and Relatively Class Correlated Rules for Associative Classification of Imbalanced Datasets\",\"authors\":\"Florian Verhein, S. Chawla\",\"doi\":\"10.1109/ICDM.2007.63\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The application of association rule mining to classification has led to a new family of classifiers which are often referred to as \\\"associative classifiers (ACs)\\\". An advantage of ACs is that they are rule-based and thus lend themselves to an easier interpretation. Rule-based classifiers can play a very important role in applications such as medical diagnosis and fraud detection where \\\"imbalanced data sets\\\" are the norm and not the exception. The focus of this paper is to extend and modify ACs for classification on imbalanced data sets using only statistical techniques. We combine the use of statistically significant rules with a new measure, the Class Correlation Ratio (CCR), to build an AC which we call SPARCCC. Experiments show that in terms of classification quality, SPARCCC performs comparably on balanced datasets and outperforms other AC techniques on imbalanced data sets. It also has a significantly smaller rule base and is much more computationally efficient.\",\"PeriodicalId\":233758,\"journal\":{\"name\":\"Seventh IEEE International Conference on Data Mining (ICDM 2007)\",\"volume\":\"14 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-10-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"50\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Seventh IEEE International Conference on Data Mining (ICDM 2007)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDM.2007.63\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDM.2007.63","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Using Significant, Positively Associated and Relatively Class Correlated Rules for Associative Classification of Imbalanced Datasets
The application of association rule mining to classification has led to a new family of classifiers which are often referred to as "associative classifiers (ACs)". An advantage of ACs is that they are rule-based and thus lend themselves to an easier interpretation. Rule-based classifiers can play a very important role in applications such as medical diagnosis and fraud detection where "imbalanced data sets" are the norm and not the exception. The focus of this paper is to extend and modify ACs for classification on imbalanced data sets using only statistical techniques. We combine the use of statistically significant rules with a new measure, the Class Correlation Ratio (CCR), to build an AC which we call SPARCCC. Experiments show that in terms of classification quality, SPARCCC performs comparably on balanced datasets and outperforms other AC techniques on imbalanced data sets. It also has a significantly smaller rule base and is much more computationally efficient.