Classification by Frequent Association Rules

IF 0.4 Q4 COMPUTER SCIENCE, INFORMATION SYSTEMS

Applied Computing Review Pub Date : 2023-03-27 DOI:10.1145/3555776.3577848

Md Rayhan Kabir, Osmar Zaiane

{"title":"Classification by Frequent Association Rules","authors":"Md Rayhan Kabir, Osmar Zaiane","doi":"10.1145/3555776.3577848","DOIUrl":null,"url":null,"abstract":"Over the last two decades, Associative Classifiers have shown competitive performance in the task of predicting class labels. Along with the performance in accuracy, associative classifiers produce human-readable predictive rules which is very helpful to understand the decision process of the classifiers. Associative classifiers from early days suffer from the limitation requiring proper threshold value setting which is dataset-specific. Recently some studies eliminated that limitation by producing statistically significant rules. Though recent models showed very competitive performance with state-of-the-art classifiers, their performance is still impacted if the feature vector of the training data is very large. An ensemble model can solve this issue by training each base learner with a subset of the feature vector. In this study, we propose an ensemble model Classification by Frequent Association Rules (CFAR) using associative classifiers as base learners. In our approach, instead of using a classical ensemble and a voting method, we rank the generated rules based on predominance among base learners and select a subset of the rules for predicting class labels. We use 10 datasets from the UCI repository to evaluate the performance of the proposed model. Our ensemble approach CFAR eliminates the limitation of high memory requirement and runtime of recent associative classifiers if training datasets have large feature vectors. Among the datasets we used, along with increasing accuracy in most cases, CFAR removes the noisy rules which enhances the interpretability of the model.","PeriodicalId":42971,"journal":{"name":"Applied Computing Review","volume":"7 1","pages":""},"PeriodicalIF":0.4000,"publicationDate":"2023-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Computing Review","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3555776.3577848","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 1

Abstract

Over the last two decades, Associative Classifiers have shown competitive performance in the task of predicting class labels. Along with the performance in accuracy, associative classifiers produce human-readable predictive rules which is very helpful to understand the decision process of the classifiers. Associative classifiers from early days suffer from the limitation requiring proper threshold value setting which is dataset-specific. Recently some studies eliminated that limitation by producing statistically significant rules. Though recent models showed very competitive performance with state-of-the-art classifiers, their performance is still impacted if the feature vector of the training data is very large. An ensemble model can solve this issue by training each base learner with a subset of the feature vector. In this study, we propose an ensemble model Classification by Frequent Association Rules (CFAR) using associative classifiers as base learners. In our approach, instead of using a classical ensemble and a voting method, we rank the generated rules based on predominance among base learners and select a subset of the rules for predicting class labels. We use 10 datasets from the UCI repository to evaluate the performance of the proposed model. Our ensemble approach CFAR eliminates the limitation of high memory requirement and runtime of recent associative classifiers if training datasets have large feature vectors. Among the datasets we used, along with increasing accuracy in most cases, CFAR removes the noisy rules which enhances the interpretability of the model.

查看原文本刊更多论文

频繁关联规则分类

在过去的二十年中，关联分类器在预测类标签的任务中表现出了竞争力。联想分类器在提高准确率的同时，还能生成人类可读的预测规则，这对理解分类器的决策过程非常有帮助。早期的关联分类器受到限制，需要适当的阈值设置，这是特定于数据集的。最近一些研究通过产生具有统计意义的规则消除了这一限制。尽管最近的模型与最先进的分类器表现出非常有竞争力的性能，但如果训练数据的特征向量非常大，它们的性能仍然会受到影响。集成模型可以通过使用特征向量的子集来训练每个基学习器来解决这个问题。在这项研究中，我们提出了一种基于频繁关联规则的集成模型分类(CFAR)，使用关联分类器作为基础学习器。在我们的方法中，我们没有使用经典的集成和投票方法，而是基于基础学习器中的优势对生成的规则进行排序，并选择规则的一个子集来预测类标签。我们使用来自UCI存储库的10个数据集来评估所提出模型的性能。我们的集成方法CFAR消除了当前关联分类器在训练数据集具有较大特征向量时对内存和运行时间要求较高的限制。在我们使用的数据集中，随着大多数情况下准确性的提高，CFAR去除了噪声规则，从而增强了模型的可解释性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Applied Computing Review COMPUTER SCIENCE, INFORMATION SYSTEMS-

自引率

40.00%

发文量