Discriminative frequent subgraph mining with optimality guarantees

IF 2.1 4区 数学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Marisa Thoma, Hong Cheng, A. Gretton, Jiawei Han, H. Kriegel, Alex Smola, Le Song, Philip S. Yu, Xifeng Yan, Karsten M. Borgwardt
{"title":"Discriminative frequent subgraph mining with optimality guarantees","authors":"Marisa Thoma, Hong Cheng, A. Gretton, Jiawei Han, H. Kriegel, Alex Smola, Le Song, Philip S. Yu, Xifeng Yan, Karsten M. Borgwardt","doi":"10.1002/SAM.V3:5","DOIUrl":null,"url":null,"abstract":"The goal of frequent subgraph mining is to detect subgraphs that frequently occur in a dataset of graphs. In classification settings, one is often interested in discovering discriminative frequent subgraphs, whose presence or absence is indicative of the class membership of a graph. In this article, we propose an approach to feature selection on frequent subgraphs, called CORK, that combines two central advantages. First, it optimizes a submodular quality criterion, which means that we can yield a near-optimal solution using greedy feature selection. Second, our submodular quality function criterion can be integrated into gSpan, the state-of-the-art tool for frequent subgraph mining, and help to prune the search space for discriminative frequent subgraphs even during frequent subgraph mining. Copyright © 2010 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 3: 302-318, 2010","PeriodicalId":48684,"journal":{"name":"Statistical Analysis and Data Mining","volume":"3 1","pages":"302-318"},"PeriodicalIF":2.1000,"publicationDate":"2010-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"23","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistical Analysis and Data Mining","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1002/SAM.V3:5","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 23

Abstract

The goal of frequent subgraph mining is to detect subgraphs that frequently occur in a dataset of graphs. In classification settings, one is often interested in discovering discriminative frequent subgraphs, whose presence or absence is indicative of the class membership of a graph. In this article, we propose an approach to feature selection on frequent subgraphs, called CORK, that combines two central advantages. First, it optimizes a submodular quality criterion, which means that we can yield a near-optimal solution using greedy feature selection. Second, our submodular quality function criterion can be integrated into gSpan, the state-of-the-art tool for frequent subgraph mining, and help to prune the search space for discriminative frequent subgraphs even during frequent subgraph mining. Copyright © 2010 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 3: 302-318, 2010
具有最优性保证的判别频繁子图挖掘
频繁子图挖掘的目标是检测频繁出现在图数据集中的子图。在分类设置中,人们通常对发现判别频繁子图感兴趣,它们的存在与否表明了图的类隶属度。在本文中,我们提出了一种在频繁子图上进行特征选择的方法,称为CORK,它结合了两个主要优点。首先,它优化了一个次模质量准则,这意味着我们可以使用贪婪特征选择产生一个接近最优的解决方案。其次,我们的子模块质量函数准则可以集成到gSpan中,gSpan是最先进的频繁子图挖掘工具,即使在频繁子图挖掘过程中,也有助于减少判别性频繁子图的搜索空间。版权所有©2010 Wiley期刊公司统计分析与数据挖掘,2010
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Statistical Analysis and Data Mining
Statistical Analysis and Data Mining COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCEC-COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
CiteScore
3.20
自引率
7.70%
发文量
43
期刊介绍: Statistical Analysis and Data Mining addresses the broad area of data analysis, including statistical approaches, machine learning, data mining, and applications. Topics include statistical and computational approaches for analyzing massive and complex datasets, novel statistical and/or machine learning methods and theory, and state-of-the-art applications with high impact. Of special interest are articles that describe innovative analytical techniques, and discuss their application to real problems, in such a way that they are accessible and beneficial to domain experts across science, engineering, and commerce. The focus of the journal is on papers which satisfy one or more of the following criteria: Solve data analysis problems associated with massive, complex datasets Develop innovative statistical approaches, machine learning algorithms, or methods integrating ideas across disciplines, e.g., statistics, computer science, electrical engineering, operation research. Formulate and solve high-impact real-world problems which challenge existing paradigms via new statistical and/or computational models Provide survey to prominent research topics.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信