Anytime Subgroup Discovery in High Dimensional Numerical Data

Romain Mathonat, Diana Nurbakova, Jean-François Boulicaut, Mehdi Kaytoue, bon
{"title":"Anytime Subgroup Discovery in High Dimensional Numerical Data","authors":"Romain Mathonat, Diana Nurbakova, Jean-François Boulicaut, Mehdi Kaytoue, bon","doi":"10.1109/DSAA53316.2021.9564223","DOIUrl":null,"url":null,"abstract":"Subgroup discovery (SD) enables one to elicit patterns that strongly discriminate a class label. When it comes to numerical data, most of the existing SD approaches perform data discretizations and thus suffer from information loss. A few algorithms avoid such a loss by considering the search space of every interval pattern built on the dataset numerical values and provide an “anytime” property: at any moment, they are able to provide a result that improves over time. Given a sufficient time/memory budget, they may eventually complete an exhaustive search. However, such approaches are often intractable when dealing with high-dimensional numerical data, for instance, when extracting features from real-life multivariate time series. To overcome such limitations, we propose MonteCloPi, an approach based on a bottom-up exploration of numerical patterns with a Monte Carlo Tree Search. It enables to have a better exploration-exploitation trade-off between exploration and exploitation when sampling huge search spaces. Our extensive set of experiments proves the efficiency of MonteCloPi on high-dimensional data with hundreds of attributes. We finally discuss the actionability of discovered subgroups when looking for skill analysis from Rocket League action logs.","PeriodicalId":129612,"journal":{"name":"2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DSAA53316.2021.9564223","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Subgroup discovery (SD) enables one to elicit patterns that strongly discriminate a class label. When it comes to numerical data, most of the existing SD approaches perform data discretizations and thus suffer from information loss. A few algorithms avoid such a loss by considering the search space of every interval pattern built on the dataset numerical values and provide an “anytime” property: at any moment, they are able to provide a result that improves over time. Given a sufficient time/memory budget, they may eventually complete an exhaustive search. However, such approaches are often intractable when dealing with high-dimensional numerical data, for instance, when extracting features from real-life multivariate time series. To overcome such limitations, we propose MonteCloPi, an approach based on a bottom-up exploration of numerical patterns with a Monte Carlo Tree Search. It enables to have a better exploration-exploitation trade-off between exploration and exploitation when sampling huge search spaces. Our extensive set of experiments proves the efficiency of MonteCloPi on high-dimensional data with hundreds of attributes. We finally discuss the actionability of discovered subgroups when looking for skill analysis from Rocket League action logs.
高维数值数据中的任意子群发现
子组发现(Subgroup discovery, SD)使人们能够得出强烈区分类标签的模式。当涉及到数值数据时,现有的SD方法大多会进行数据离散化,因此存在信息丢失的问题。一些算法通过考虑构建在数据集数值上的每个间隔模式的搜索空间并提供“随时”属性来避免这种损失:在任何时刻,它们都能够提供随时间推移而改进的结果。如果有足够的时间/内存预算,它们可能最终完成穷举搜索。然而,这种方法在处理高维数值数据时往往难以处理,例如,在从现实生活中的多变量时间序列中提取特征时。为了克服这些限制,我们提出了MonteCloPi,这是一种基于蒙特卡罗树搜索的自下而上的数值模式探索方法。当对巨大的搜索空间进行采样时,它可以在探索和利用之间进行更好的权衡。我们大量的实验证明了MonteCloPi在具有数百个属性的高维数据上的效率。最后,我们讨论了从《火箭联盟》的动作日志中寻找技能分析时所发现的子群组的可操作性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信