{"title":"DPClass:一个有效而简洁的基于判别模式的分类框架","authors":"Jingbo Shang, Wenzhu Tong, Jian Peng, Jiawei Han","doi":"10.1137/1.9781611974348.64","DOIUrl":null,"url":null,"abstract":"Pattern-based classification was originally proposed to improve the accuracy using selected frequent patterns, where many efforts were paid to prune a huge number of non-discriminative frequent patterns. On the other hand, tree-based models have shown strong abilities on many classification tasks since they can easily build high-order interactions between different features and also handle both numerical and categorical features as well as high dimensional features. By taking the advantage of both modeling methodologies, we propose a natural and effective way to resolve pattern-based classification by adopting discriminative patterns which are the prefix paths from root to nodes in tree-based models (e.g., random forest). Moreover, we further compress the number of discriminative patterns by selecting the most effective pattern combinations that fit into a generalized linear model. As a result, our discriminative pattern-based classification framework (DPClass) could perform as good as previous state-of-the-art algorithms, provide great interpretability by utilizing only very limited number of discriminative patterns, and predict new data extremely fast. More specifically, in our experiments, DPClass could gain even better accuracy by only using top-20 discriminative patterns. The framework so generated is very concise and highly explanatory to human experts.","PeriodicalId":74533,"journal":{"name":"Proceedings of the ... SIAM International Conference on Data Mining. SIAM International Conference on Data Mining","volume":"22 1","pages":"567-575"},"PeriodicalIF":0.0000,"publicationDate":"2016-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"DPClass: An Effective but Concise Discriminative Patterns-Based Classification Framework\",\"authors\":\"Jingbo Shang, Wenzhu Tong, Jian Peng, Jiawei Han\",\"doi\":\"10.1137/1.9781611974348.64\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Pattern-based classification was originally proposed to improve the accuracy using selected frequent patterns, where many efforts were paid to prune a huge number of non-discriminative frequent patterns. On the other hand, tree-based models have shown strong abilities on many classification tasks since they can easily build high-order interactions between different features and also handle both numerical and categorical features as well as high dimensional features. By taking the advantage of both modeling methodologies, we propose a natural and effective way to resolve pattern-based classification by adopting discriminative patterns which are the prefix paths from root to nodes in tree-based models (e.g., random forest). Moreover, we further compress the number of discriminative patterns by selecting the most effective pattern combinations that fit into a generalized linear model. As a result, our discriminative pattern-based classification framework (DPClass) could perform as good as previous state-of-the-art algorithms, provide great interpretability by utilizing only very limited number of discriminative patterns, and predict new data extremely fast. More specifically, in our experiments, DPClass could gain even better accuracy by only using top-20 discriminative patterns. The framework so generated is very concise and highly explanatory to human experts.\",\"PeriodicalId\":74533,\"journal\":{\"name\":\"Proceedings of the ... SIAM International Conference on Data Mining. SIAM International Conference on Data Mining\",\"volume\":\"22 1\",\"pages\":\"567-575\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the ... SIAM International Conference on Data Mining. SIAM International Conference on Data Mining\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1137/1.9781611974348.64\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... SIAM International Conference on Data Mining. SIAM International Conference on Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1137/1.9781611974348.64","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
DPClass: An Effective but Concise Discriminative Patterns-Based Classification Framework
Pattern-based classification was originally proposed to improve the accuracy using selected frequent patterns, where many efforts were paid to prune a huge number of non-discriminative frequent patterns. On the other hand, tree-based models have shown strong abilities on many classification tasks since they can easily build high-order interactions between different features and also handle both numerical and categorical features as well as high dimensional features. By taking the advantage of both modeling methodologies, we propose a natural and effective way to resolve pattern-based classification by adopting discriminative patterns which are the prefix paths from root to nodes in tree-based models (e.g., random forest). Moreover, we further compress the number of discriminative patterns by selecting the most effective pattern combinations that fit into a generalized linear model. As a result, our discriminative pattern-based classification framework (DPClass) could perform as good as previous state-of-the-art algorithms, provide great interpretability by utilizing only very limited number of discriminative patterns, and predict new data extremely fast. More specifically, in our experiments, DPClass could gain even better accuracy by only using top-20 discriminative patterns. The framework so generated is very concise and highly explanatory to human experts.