DPClass: An Effective but Concise Discriminative Patterns-Based Classification Framework

Jingbo Shang, Wenzhu Tong, Jian Peng, Jiawei Han
{"title":"DPClass: An Effective but Concise Discriminative Patterns-Based Classification Framework","authors":"Jingbo Shang, Wenzhu Tong, Jian Peng, Jiawei Han","doi":"10.1137/1.9781611974348.64","DOIUrl":null,"url":null,"abstract":"Pattern-based classification was originally proposed to improve the accuracy using selected frequent patterns, where many efforts were paid to prune a huge number of non-discriminative frequent patterns. On the other hand, tree-based models have shown strong abilities on many classification tasks since they can easily build high-order interactions between different features and also handle both numerical and categorical features as well as high dimensional features. By taking the advantage of both modeling methodologies, we propose a natural and effective way to resolve pattern-based classification by adopting discriminative patterns which are the prefix paths from root to nodes in tree-based models (e.g., random forest). Moreover, we further compress the number of discriminative patterns by selecting the most effective pattern combinations that fit into a generalized linear model. As a result, our discriminative pattern-based classification framework (DPClass) could perform as good as previous state-of-the-art algorithms, provide great interpretability by utilizing only very limited number of discriminative patterns, and predict new data extremely fast. More specifically, in our experiments, DPClass could gain even better accuracy by only using top-20 discriminative patterns. The framework so generated is very concise and highly explanatory to human experts.","PeriodicalId":74533,"journal":{"name":"Proceedings of the ... SIAM International Conference on Data Mining. SIAM International Conference on Data Mining","volume":"22 1","pages":"567-575"},"PeriodicalIF":0.0000,"publicationDate":"2016-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... SIAM International Conference on Data Mining. SIAM International Conference on Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1137/1.9781611974348.64","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

Abstract

Pattern-based classification was originally proposed to improve the accuracy using selected frequent patterns, where many efforts were paid to prune a huge number of non-discriminative frequent patterns. On the other hand, tree-based models have shown strong abilities on many classification tasks since they can easily build high-order interactions between different features and also handle both numerical and categorical features as well as high dimensional features. By taking the advantage of both modeling methodologies, we propose a natural and effective way to resolve pattern-based classification by adopting discriminative patterns which are the prefix paths from root to nodes in tree-based models (e.g., random forest). Moreover, we further compress the number of discriminative patterns by selecting the most effective pattern combinations that fit into a generalized linear model. As a result, our discriminative pattern-based classification framework (DPClass) could perform as good as previous state-of-the-art algorithms, provide great interpretability by utilizing only very limited number of discriminative patterns, and predict new data extremely fast. More specifically, in our experiments, DPClass could gain even better accuracy by only using top-20 discriminative patterns. The framework so generated is very concise and highly explanatory to human experts.
DPClass:一个有效而简洁的基于判别模式的分类框架
基于模式的分类最初是为了使用选择的频繁模式来提高准确率而提出的,其中付出了许多努力来修剪大量的非判别性频繁模式。另一方面,基于树的模型可以很容易地建立不同特征之间的高阶交互,并且可以处理数值和分类特征以及高维特征,因此在许多分类任务中显示出强大的能力。通过利用这两种建模方法的优势,我们提出了一种自然有效的方法来解决基于模式的分类问题,即采用判别模式,即基于树的模型(例如随机森林)中从根到节点的前缀路径。此外,我们通过选择适合广义线性模型的最有效模式组合来进一步压缩判别模式的数量。因此,我们的基于判别模式的分类框架(DPClass)可以像以前最先进的算法一样执行得很好,仅使用非常有限的判别模式就提供了很好的可解释性,并且非常快地预测新数据。更具体地说,在我们的实验中,DPClass仅使用前20个判别模式就可以获得更好的准确性。这样生成的框架非常简洁,对人类专家来说具有很强的解释性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信