Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining最新文献

筛选
英文 中文
Mining top-k frequent items in a data stream with flexible sliding windows 使用灵活的滑动窗口在数据流中挖掘top-k频繁项
Hoang Thanh Lam, T. Calders
{"title":"Mining top-k frequent items in a data stream with flexible sliding windows","authors":"Hoang Thanh Lam, T. Calders","doi":"10.1145/1835804.1835842","DOIUrl":"https://doi.org/10.1145/1835804.1835842","url":null,"abstract":"We study the problem of finding the k most frequent items in a stream of items for the recently proposed max-frequency measure. Based on the properties of an item, the max-frequency of an item is counted over a sliding window of which the length changes dynamically. Besides being parameterless, this way of measuring the support of items was shown to have the advantage of a faster detection of bursts in a stream, especially if the set of items is heterogeneous. The algorithm that was proposed for maintaining all frequent items, however, scales poorly when the number of items becomes large. Therefore, in this paper we propose, instead of reporting all frequent items, to only mine the top-k most frequent ones. First we prove that in order to solve this problem exactly, we still need a prohibitive amount of memory (at least linear in the number of items). Yet, under some reasonable conditions, we show both theoretically and empirically that a memory-efficient algorithm exists. A prototype of this algorithm is implemented and we present its performance w.r.t. memory-efficiency on real-life data and in controlled experiments with synthetic data.","PeriodicalId":20529,"journal":{"name":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76170472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
Fast euclidean minimum spanning tree: algorithm, analysis, and applications 快速欧几里得最小生成树:算法,分析和应用
William B. March, P. Ram, Alexander G. Gray
{"title":"Fast euclidean minimum spanning tree: algorithm, analysis, and applications","authors":"William B. March, P. Ram, Alexander G. Gray","doi":"10.1145/1835804.1835882","DOIUrl":"https://doi.org/10.1145/1835804.1835882","url":null,"abstract":"The Euclidean Minimum Spanning Tree problem has applications in a wide range of fields, and many efficient algorithms have been developed to solve it. We present a new, fast, general EMST algorithm, motivated by the clustering and analysis of astronomical data. Large-scale astronomical surveys, including the Sloan Digital Sky Survey, and large simulations of the early universe, such as the Millennium Simulation, can contain millions of points and fill terabytes of storage. Traditional EMST methods scale quadratically, and more advanced methods lack rigorous runtime guarantees. We present a new dual-tree algorithm for efficiently computing the EMST, use adaptive algorithm analysis to prove the tightest (and possibly optimal) runtime bound for the EMST problem to-date, and demonstrate the scalability of our method on astronomical data sets.","PeriodicalId":20529,"journal":{"name":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77459330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 100
Discovering precursors to aviation safety incidents: from massive data to actionable information 发现航空安全事故的先兆:从海量数据到可操作信息
A. Srivastava
{"title":"Discovering precursors to aviation safety incidents: from massive data to actionable information","authors":"A. Srivastava","doi":"10.1145/1866814.1866818","DOIUrl":"https://doi.org/10.1145/1866814.1866818","url":null,"abstract":"","PeriodicalId":20529,"journal":{"name":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77192195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Clustering by synchronization 通过同步进行集群
Christian Böhm, C. Plant, Junming Shao, Qinli Yang
{"title":"Clustering by synchronization","authors":"Christian Böhm, C. Plant, Junming Shao, Qinli Yang","doi":"10.1145/1835804.1835879","DOIUrl":"https://doi.org/10.1145/1835804.1835879","url":null,"abstract":"Synchronization is a powerful basic concept in nature regulating a large variety of complex processes ranging from the metabolism in the cell to social behavior in groups of individuals. Therefore, synchronization phenomena have been extensively studied and models robustly capturing the dynamical synchronization process have been proposed, e.g. the Extensive Kuramoto Model. Inspired by the powerful concept of synchronization, we propose Sync, a novel approach to clustering. The basic idea is to view each data object as a phase oscillator and simulate the interaction behavior of the objects over time. As time evolves, similar objects naturally synchronize together and form distinct clusters. Inherited from synchronization, Sync has several desirable properties: The clusters revealed by dynamic synchronization truly reflect the intrinsic structure of the data set, Sync does not rely on any distribution assumption and allows detecting clusters of arbitrary number, shape and size. Moreover, the concept of synchronization allows natural outlier handling, since outliers do not synchronize with cluster objects. For fully automatic clustering, we propose to combine Sync with the Minimum Description Length principle. Extensive experiments on synthetic and real world data demonstrate the effectiveness and efficiency of our approach.","PeriodicalId":20529,"journal":{"name":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84337036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 80
Semi-supervised feature selection for graph classification 图分类的半监督特征选择
Xiangnan Kong, Philip S. Yu
{"title":"Semi-supervised feature selection for graph classification","authors":"Xiangnan Kong, Philip S. Yu","doi":"10.1145/1835804.1835905","DOIUrl":"https://doi.org/10.1145/1835804.1835905","url":null,"abstract":"The problem of graph classification has attracted great interest in the last decade. Current research on graph classification assumes the existence of large amounts of labeled training graphs. However, in many applications, the labels of graph data are very expensive or difficult to obtain, while there are often copious amounts of unlabeled graph data available. In this paper, we study the problem of semi-supervised feature selection for graph classification and propose a novel solution, called gSSC, to efficiently search for optimal subgraph features with labeled and unlabeled graphs. Different from existing feature selection methods in vector spaces which assume the feature set is given, we perform semi-supervised feature selection for graph data in a progressive way together with the subgraph feature mining process. We derive a feature evaluation criterion, named gSemi, to estimate the usefulness of subgraph features based upon both labeled and unlabeled graphs. Then we propose a branch-and-bound algorithm to efficiently search for optimal subgraph features by judiciously pruning the subgraph search space. Empirical studies on several real-world tasks demonstrate that our semi-supervised feature selection approach can effectively boost graph classification performances with semi-supervised feature selection and is very efficient by pruning the subgraph search space using both labeled and unlabeled graphs.","PeriodicalId":20529,"journal":{"name":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78494335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 122
A hierarchical information theoretic technique for the discovery of non linear alternative clusterings 一种用于发现非线性可选聚类的层次信息理论技术
Xuan-Hong Dang, J. Bailey
{"title":"A hierarchical information theoretic technique for the discovery of non linear alternative clusterings","authors":"Xuan-Hong Dang, J. Bailey","doi":"10.1145/1835804.1835878","DOIUrl":"https://doi.org/10.1145/1835804.1835878","url":null,"abstract":"Discovery of alternative clusterings is an important method for exploring complex datasets. It provides the capability for the user to view clustering behaviour from different perspectives and thus explore new hypotheses. However, current algorithms for alternative clustering have focused mainly on linear scenarios and may not perform as desired for datasets containing clusters with non linear shapes. Our goal in this paper is to address this challenge of non linearity. In particular, we propose a novel algorithm to uncover an alternative clustering that is distinctively different from an existing, reference clustering. Our technique is information theory based and aims to ensure alternative clustering quality by maximizing the mutual information between clustering labels and data observations, whilst at the same time ensuring alternative clustering distinctiveness by minimizing the information sharing between the two clusterings. We perform experiments to assess our method against a large range of alternative clustering algorithms in the literature. We show our technique's performance is generally better for non-linear scenarios and furthermore, is highly competitive even for simpler, linear scenarios.","PeriodicalId":20529,"journal":{"name":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88554295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 51
Diagnosing memory leaks using graph mining on heap dumps 在堆转储上使用图挖掘诊断内存泄漏
Evan K. Maxwell, Godmar Back, Naren Ramakrishnan
{"title":"Diagnosing memory leaks using graph mining on heap dumps","authors":"Evan K. Maxwell, Godmar Back, Naren Ramakrishnan","doi":"10.1145/1835804.1835822","DOIUrl":"https://doi.org/10.1145/1835804.1835822","url":null,"abstract":"Memory leaks are caused by software programs that prevent the reclamation of memory that is no longer in use. They can cause significant slowdowns, exhaustion of available storage space and, eventually, application crashes. Detecting memory leaks is challenging because real-world applications are built on multiple layers of software frameworks, making it difficult for a developer to know whether observed references to objects are legitimate or the cause of a leak. We present a graph mining solution to this problem wherein we analyze heap dumps to automatically identify subgraphs which could represent potential memory leak sources. Although heap dumps are commonly analyzed in existing heap profiling tools, our work is the first to apply a graph grammar mining solution to this problem. Unlike classical graph mining work, we show that it suffices to mine the dominator tree of the heap dump, which is significantly smaller than the underlying graph. Our approach identifies not just leaking candidates and their structure, but also provides aggregate information about the access path to the leaks. We demonstrate several synthetic as well as real-world examples of heap dumps for which our approach provides more insight into the problem than state-of-the-art tools such as Eclipse's MAT.","PeriodicalId":20529,"journal":{"name":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86224636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 50
Unifying dependent clustering and disparate clustering for non-homogeneous data 统一非同构数据的依赖聚类和异构聚类
M. S. Hossain, S. Tadepalli, L. Watson, I. Davidson, R. Helm, Naren Ramakrishnan
{"title":"Unifying dependent clustering and disparate clustering for non-homogeneous data","authors":"M. S. Hossain, S. Tadepalli, L. Watson, I. Davidson, R. Helm, Naren Ramakrishnan","doi":"10.1145/1835804.1835880","DOIUrl":"https://doi.org/10.1145/1835804.1835880","url":null,"abstract":"Modern data mining settings involve a combination of attribute-valued descriptors over entities as well as specified relationships between these entities. We present an approach to cluster such non-homogeneous datasets by using the relationships to impose either dependent clustering or disparate clustering constraints. Unlike prior work that views constraints as boolean criteria, we present a formulation that allows constraints to be satisfied or violated in a smooth manner. This enables us to achieve dependent clustering and disparate clustering using the same optimization framework by merely maximizing versus minimizing the objective function. We present results on both synthetic data as well as several real-world datasets.","PeriodicalId":20529,"journal":{"name":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84467970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Balanced allocation with succinct representation 均衡的分配,简洁的表示
S. Alaei, Ravi Kumar, Azarakhsh Malekian, Erik Vee
{"title":"Balanced allocation with succinct representation","authors":"S. Alaei, Ravi Kumar, Azarakhsh Malekian, Erik Vee","doi":"10.1145/1835804.1835872","DOIUrl":"https://doi.org/10.1145/1835804.1835872","url":null,"abstract":"Motivated by applications in guaranteed delivery in computational advertising, we consider the general problem of balanced allocation in a bipartite supply-demand setting. Our formulation captures the notion of deviation from being balanced by a convex penalty function. While this formulation admits a convex programming solution, we strive for more robust and scalable algorithms. For the case of L1 penalty functions we obtain a simple combinatorial algorithm based on min-cost flow in graphs and show how to precompute a linear amount of information such that the allocation along any edge can be approximated in constant time. We then extend our combinatorial solution to any convex function by solving a convex cost flow. These scalable methods may have applications in other contexts stipulating balanced allocation. We study the performance of our algorithms on large real-world graphs and show that they are efficient, scalable, and robust in practice.","PeriodicalId":20529,"journal":{"name":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90040446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining 第16届ACM SIGKDD知识发现与数据挖掘国际会议论文集
Bharat Rao, Balaji Krishnapuram, A. Tomkins, Qiang Yang
{"title":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","authors":"Bharat Rao, Balaji Krishnapuram, A. Tomkins, Qiang Yang","doi":"10.1145/1835804","DOIUrl":"https://doi.org/10.1145/1835804","url":null,"abstract":"KDD-2010, the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, is being held in Washington, DC, USA, on July 24--28, 2010. KDD is the leading international forum for the exchange of research results and practical experience in the field of knowledge discovery and data mining. As the quantity of data available to organizations and individuals continues to grow rapidly, and the need to extract useful knowledge from them becomes more intense, scientists, government workers and business people turn to the KDD community for solutions. This volume contains a snapshot of a year of developments in this field; we hope you will find it useful and rewarding. \u0000 \u0000The KDD-2010 technical program features four parallel research tracks and an industrial / government track. The program also features keynotes from leading creators and consumers of KDD technology, 12 workshops, 12 tutorials and one panel. The 2010 KDD Cup competition focuses on educational data mining to support improvements in the field of computer aided instruction. Dozens of technical demonstrations and exhibits from vendors and other organizations underscore the conference's dual role as the leading industry and academic forum to discuss the advances in this field of research. \u0000 \u0000The call for papers attracted 578 research papers and 101 industrial and government submissions from around the world. Each paper was independently reviewed by three members of the program committee for originality, significance, technical quality, and clarity of presentation. This year's research track introduced an author-feedback phase in the review process, in which authors were invited to comment on the preliminary reviews that they received. The objective of the feedback phase is to ensure greater transparency and fairness, as the authors' responses are taken into account in a subsequent discussion phase moderated by Senior Program Committee (SPC) members. There was much discussion among the reviewers in the subsequent discussion phase before the final decisions. In the end, the program committee accepted 77 papers for long presentations and 24 papers for short presentations into the research track, representing an aggregated acceptance rate of 17.4%. \u0000 \u0000This year's Industry and Government track emphasized the successful uses of KDD technology, including deployed applications incorporating KDD technologies and discoveries of valid, novel, understandable, and demonstrably useful patterns from large datasets in industry and government, as well as emerging applications and technology, including challenges and issues arising from attempts to deploy KDD technology to solve specific industry or government problems. The industry and government track of the conference accepted 11 papers for long presentations and 9 papers for short presentations into the program, representing an aggregated acceptance rate of 19.8%. \u0000 \u0000We are glad to see that the conference remains strongly competitive and o","PeriodicalId":20529,"journal":{"name":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90148781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信