如果基于约束的挖掘是答案:约束是什么?(邀请谈话)

2008 IEEE International Conference on Data Mining Workshops Pub Date : 2008-12-15 DOI:10.1109/ICDMW.2008.96

Jean-François Boulicaut

{"title":"如果基于约束的挖掘是答案:约束是什么?(邀请谈话)","authors":"Jean-François Boulicaut","doi":"10.1109/ICDMW.2008.96","DOIUrl":null,"url":null,"abstract":"Constraint-based mining has been proven to be extremely useful. It has been applied not only to many pattern discovery settings (e.g., for sequential pattern mining) but also, recently, on classification and clustering tasks (see, e.g., ). It appears as a key technology for an inductive database perspective on knowledge discovery in databases (KDD), and constraint-based mining is indeed an answer to important data mining issues (e.g., for supporting a priori relevancy and subjective interestingness but also to achieve computational feasibility). However, few authors study the nature of constraints and their semantics. Considering several examples of non trivial KDD processes, we discuss the Hows, Whys, and Whens of constraints in a broader context than. Our thesis is that most of the typical data mining methods are constraint-based techniques and that it is worth studying and designing them as such. In many cases, we exploit constraints that are not really explicit (e.g., the objective function optimization of a clustering for a given similarity measure) and/or constraints whose operational semantics are relaxed w.r.t. their declarative counterparts (e.g., the optimization constraint is not enforced because of some local optimization heuristics). We think that is important to explicit every primitive constraint and the operators that combine them because this constitutes the declarative semantics of the constraints and thus the mining queries. Then, a well-studied challenge is to design some operational semantics like correct and complete solvers and/or relaxation schemes for more or less complex constraints. Designing complete solvers has been extensively studied in useful but yet limited settings (see, e.g., the algorithms for exploiting combinations of monotonic and anti-monotonic primitives). It is however clear that many relevant constraints lack from such nice properties. On another hand, understanding constraint relaxation strategies remains fairly open, certainly because of its intrinsically heuristic nature. Interestingly, the recent approaches that suggest global pattern or model construction based on local patterns enable to revisit the relaxation issue thanks to constraint back propagation possibilities. This can be discussed within a case study on constrained co-clustering.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"196 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"If Constraint-Based Mining is the Answer: What is the Constraint? (Invited Talk)\",\"authors\":\"Jean-François Boulicaut\",\"doi\":\"10.1109/ICDMW.2008.96\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Constraint-based mining has been proven to be extremely useful. It has been applied not only to many pattern discovery settings (e.g., for sequential pattern mining) but also, recently, on classification and clustering tasks (see, e.g., ). It appears as a key technology for an inductive database perspective on knowledge discovery in databases (KDD), and constraint-based mining is indeed an answer to important data mining issues (e.g., for supporting a priori relevancy and subjective interestingness but also to achieve computational feasibility). However, few authors study the nature of constraints and their semantics. Considering several examples of non trivial KDD processes, we discuss the Hows, Whys, and Whens of constraints in a broader context than. Our thesis is that most of the typical data mining methods are constraint-based techniques and that it is worth studying and designing them as such. In many cases, we exploit constraints that are not really explicit (e.g., the objective function optimization of a clustering for a given similarity measure) and/or constraints whose operational semantics are relaxed w.r.t. their declarative counterparts (e.g., the optimization constraint is not enforced because of some local optimization heuristics). We think that is important to explicit every primitive constraint and the operators that combine them because this constitutes the declarative semantics of the constraints and thus the mining queries. Then, a well-studied challenge is to design some operational semantics like correct and complete solvers and/or relaxation schemes for more or less complex constraints. Designing complete solvers has been extensively studied in useful but yet limited settings (see, e.g., the algorithms for exploiting combinations of monotonic and anti-monotonic primitives). It is however clear that many relevant constraints lack from such nice properties. On another hand, understanding constraint relaxation strategies remains fairly open, certainly because of its intrinsically heuristic nature. Interestingly, the recent approaches that suggest global pattern or model construction based on local patterns enable to revisit the relaxation issue thanks to constraint back propagation possibilities. This can be discussed within a case study on constrained co-clustering.\",\"PeriodicalId\":175955,\"journal\":{\"name\":\"2008 IEEE International Conference on Data Mining Workshops\",\"volume\":\"196 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-12-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 IEEE International Conference on Data Mining Workshops\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDMW.2008.96\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 IEEE International Conference on Data Mining Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDMW.2008.96","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

基于约束的挖掘已被证明非常有用。它不仅被应用于许多模式发现设置(例如，顺序模式挖掘)，而且最近也被应用于分类和聚类任务(参见，例如)。它似乎是数据库中知识发现(KDD)的归纳数据库视角的关键技术，并且基于约束的挖掘确实是重要数据挖掘问题的答案(例如，支持先验相关性和主观兴趣，但也实现计算可行性)。然而，很少有作者研究约束的本质及其语义。考虑到几个重要的KDD过程示例，我们将在更广泛的上下文中讨论约束的“如何”、“为什么”和“何时”。我们的论点是，大多数典型的数据挖掘方法是基于约束的技术，值得研究和设计它们。在许多情况下，我们利用的约束不是真正显式的(例如，针对给定相似性度量的聚类的目标函数优化)和/或其操作语义相对于其声明性对应(例如，由于一些局部优化启发式而没有强制执行优化约束)的约束。我们认为显式每个基本约束和组合它们的操作符是很重要的，因为这构成了约束的声明性语义，从而构成了挖掘查询。然后，一个经过充分研究的挑战是为或多或少复杂的约束设计一些操作语义，如正确和完整的求解器和/或放松方案。在有用但有限的情况下，设计完全求解器已经得到了广泛的研究(参见，例如，利用单调和反单调原语组合的算法)。然而，很明显，许多相关的约束缺乏这样好的属性。另一方面，理解约束放松策略仍然是相当开放的，当然是因为其内在的启发式本质。有趣的是，最近提出的基于局部模式的全局模式或模型构建的方法，由于约束反向传播的可能性，能够重新审视松弛问题。这可以在约束共聚类的案例研究中讨论。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

If Constraint-Based Mining is the Answer: What is the Constraint? (Invited Talk)

Constraint-based mining has been proven to be extremely useful. It has been applied not only to many pattern discovery settings (e.g., for sequential pattern mining) but also, recently, on classification and clustering tasks (see, e.g., ). It appears as a key technology for an inductive database perspective on knowledge discovery in databases (KDD), and constraint-based mining is indeed an answer to important data mining issues (e.g., for supporting a priori relevancy and subjective interestingness but also to achieve computational feasibility). However, few authors study the nature of constraints and their semantics. Considering several examples of non trivial KDD processes, we discuss the Hows, Whys, and Whens of constraints in a broader context than. Our thesis is that most of the typical data mining methods are constraint-based techniques and that it is worth studying and designing them as such. In many cases, we exploit constraints that are not really explicit (e.g., the objective function optimization of a clustering for a given similarity measure) and/or constraints whose operational semantics are relaxed w.r.t. their declarative counterparts (e.g., the optimization constraint is not enforced because of some local optimization heuristics). We think that is important to explicit every primitive constraint and the operators that combine them because this constitutes the declarative semantics of the constraints and thus the mining queries. Then, a well-studied challenge is to design some operational semantics like correct and complete solvers and/or relaxation schemes for more or less complex constraints. Designing complete solvers has been extensively studied in useful but yet limited settings (see, e.g., the algorithms for exploiting combinations of monotonic and anti-monotonic primitives). It is however clear that many relevant constraints lack from such nice properties. On another hand, understanding constraint relaxation strategies remains fairly open, certainly because of its intrinsically heuristic nature. Interestingly, the recent approaches that suggest global pattern or model construction based on local patterns enable to revisit the relaxation issue thanks to constraint back propagation possibilities. This can be discussed within a case study on constrained co-clustering.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2008 IEEE International Conference on Data Mining Workshops

自引率

0.00%

发文量