2011 IEEE 11th International Conference on Data Mining最新文献

筛选
英文 中文
Detection of Cross-Channel Anomalies from Multiple Data Channels 多数据通道跨通道异常的检测
2011 IEEE 11th International Conference on Data Mining Pub Date : 2011-12-11 DOI: 10.1109/ICDM.2011.51
Duc-Son Pham, Budhaditya Saha, Dinh Q. Phung, S. Venkatesh
{"title":"Detection of Cross-Channel Anomalies from Multiple Data Channels","authors":"Duc-Son Pham, Budhaditya Saha, Dinh Q. Phung, S. Venkatesh","doi":"10.1109/ICDM.2011.51","DOIUrl":"https://doi.org/10.1109/ICDM.2011.51","url":null,"abstract":"We identify and formulate a novel problem: cross channel anomaly detection from multiple data channels. Cross channel anomalies are common amongst the individual channel anomalies, and are often portent of significant events. Using spectral approaches, we propose a two-stage detection method: anomaly detection at a single-channel level, followed by the detection of cross-channel anomalies from the amalgamation of single channel anomalies. Our mathematical analysis shows that our method is likely to reduce the false alarm rate. We demonstrate our method in two applications: document understanding with multiple text corpora, and detection of repeated anomalies in video surveillance. The experimental results consistently demonstrate the superior performance of our method compared with related state-of-art methods, including the one-class SVM and principal component pursuit. In addition, our framework can be deployed in a decentralized manner, lending itself for large scale data stream analysis.","PeriodicalId":106216,"journal":{"name":"2011 IEEE 11th International Conference on Data Mining","volume":"123 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115706038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
How Does Research Evolve? Pattern Mining for Research Meme Cycles 研究是如何进化的?模因循环研究的模式挖掘
2011 IEEE 11th International Conference on Data Mining Pub Date : 2011-12-11 DOI: 10.1109/ICDM.2011.76
Dan He, Xingquan Zhu, D. S. Parker
{"title":"How Does Research Evolve? Pattern Mining for Research Meme Cycles","authors":"Dan He, Xingquan Zhu, D. S. Parker","doi":"10.1109/ICDM.2011.76","DOIUrl":"https://doi.org/10.1109/ICDM.2011.76","url":null,"abstract":"Recent years have witnessed a great deal of attention in tracking news memes over the web, modeling shifts in the ebb and flow of their popularity. One of the most important features of news memes is that they seldom occur repeatedly, instead, they tend to shift to different but similar memes. In this work, we consider patterns in research memes, which differ significantly from news memes and have received very little attention. One significant difference between research memes and news memes lies in that research memes have cyclic development, motivating the need for models of cycles of research memes. Furthermore, these cycles may reveal important patterns of evolving research, shedding lights on how research progresses. In this paper, we formulate the modeling of the cycles of research memes, and propose solutions to the problem of identifying cycles and discovering patterns among these cycles. Experiments on two different domain applications indicate that our model does find meaningful patterns and our algorithms for pattern discovery are efficient for large scale data analysis.","PeriodicalId":106216,"journal":{"name":"2011 IEEE 11th International Conference on Data Mining","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122786846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Maximum Entropy Modelling for Assessing Results on Real-Valued Data 实值数据结果评估的最大熵模型
2011 IEEE 11th International Conference on Data Mining Pub Date : 2011-12-11 DOI: 10.1109/ICDM.2011.98
Kleanthis-Nikolaos Kontonasios, Jilles Vreeken, T. D. Bie
{"title":"Maximum Entropy Modelling for Assessing Results on Real-Valued Data","authors":"Kleanthis-Nikolaos Kontonasios, Jilles Vreeken, T. D. Bie","doi":"10.1109/ICDM.2011.98","DOIUrl":"https://doi.org/10.1109/ICDM.2011.98","url":null,"abstract":"Statistical assessment of the results of data mining is increasingly recognised as a core task in the knowledge discovery process. It is of key importance in practice, as results that might seem interesting at first glance can often be explained by well-known basic properties of the data. In pattern mining, for instance, such trivial results can be so overwhelming in number that filtering them out is a necessity in order to identify the truly interesting patterns. In this paper, we propose an approach for assessing results on real-valued rectangular databases. More specifically, using our analytical model we are able to statistically assess whether or not a discovered structure may be the trivial result of the row and column marginal distributions in the database. Our main approach is to use the Maximum Entropy principle to fit a background model to the data while respecting its marginal distributions. To find these distributions, we employ an MDL based histogram estimator, and we fit these in our model using efficient convex optimization techniques. Subsequently, our model can be used to calculate probabilities directly, as well as to efficiently sample data with the purpose of assessing results by means of empirical hypothesis testing. Notably, our approach is efficient, parameter-free, and naturally deals with missing values. As such, it represents a well-founded alternative to swap randomisation","PeriodicalId":106216,"journal":{"name":"2011 IEEE 11th International Conference on Data Mining","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126330185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
ASAP: A Self-Adaptive Prediction System for Instant Cloud Resource Demand Provisioning ASAP:一个即时云资源需求预置的自适应预测系统
2011 IEEE 11th International Conference on Data Mining Pub Date : 2011-12-11 DOI: 10.1109/ICDM.2011.25
Yexi Jiang, Chang-Shing Perng, Tao Li, Rong N. Chang
{"title":"ASAP: A Self-Adaptive Prediction System for Instant Cloud Resource Demand Provisioning","authors":"Yexi Jiang, Chang-Shing Perng, Tao Li, Rong N. Chang","doi":"10.1109/ICDM.2011.25","DOIUrl":"https://doi.org/10.1109/ICDM.2011.25","url":null,"abstract":"The promise of cloud computing is to provide computing resources instantly whenever they are needed. The state-of-art virtual machine (VM) provisioning technology can provision a VM in tens of minutes. This latency is unacceptable for jobs that need to scale out during computation. To truly enable on-the-fly scaling, new VM needs to be ready in seconds upon request. In this paper, We present an online temporal data mining system called ASAP, to model and predict the cloud VM demands. ASAP aims to extract high level characteristics from VM provisioning request stream and notify the provisioning system to prepare VMs in advance. For quantification issue, we propose Cloud Prediction Cost to encodes the cost and constraints of the cloud and guide the training of prediction algorithms. Moreover, we utilize a two-level ensemble method to capture the characteristics of the high transient demands time series. Experimental results using historical data from an IBM cloud in operation demonstrate that ASAP significantly improves the cloud service quality and provides possibility for on-the-fly provisioning.","PeriodicalId":106216,"journal":{"name":"2011 IEEE 11th International Conference on Data Mining","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126414909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 83
Finding Robust Itemsets under Subsampling 寻找子抽样下的鲁棒项集
2011 IEEE 11th International Conference on Data Mining Pub Date : 2011-12-11 DOI: 10.1145/2656261
Nikolaj Tatti, Fabian Moerchen
{"title":"Finding Robust Itemsets under Subsampling","authors":"Nikolaj Tatti, Fabian Moerchen","doi":"10.1145/2656261","DOIUrl":"https://doi.org/10.1145/2656261","url":null,"abstract":"Mining frequent patterns is plagued by the problem of pattern explosion making pattern reduction techniques a key challenge in pattern mining. In this paper we propose a novel theoretical framework for pattern reduction. We do this by measuring the robustness of a property of an item set such as closed ness or non-derivability. The robustness of a property is the probability that this property holds on random subsets of the original data. We study four properties: closed, free, non-derivable and totally shattered item sets, demonstrating how we can compute the robustness analytically without actually sampling the data. Our concept of robustness has many advantages: Unlike statistical approaches for reducing patterns, we do not assume a null hypothesis or any noise model and the patterns reported are simply a subset of all patterns with this property as opposed to approximate patterns for which the property does not really hold. If the underlying property is monotonic, then the measure is also monotonic, allowing us to efficiently mine robust item sets. We further derive a parameter-free technique for ranking item sets that can be used for top-k approaches. Our experiments demonstrate that we can successfully use the robustness measure to reduce the number of patterns and that ranking yields interesting itemsets.","PeriodicalId":106216,"journal":{"name":"2011 IEEE 11th International Conference on Data Mining","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124922907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Incremental Elliptical Boundary Estimation for Anomaly Detection in Wireless Sensor Networks 基于增量椭圆边界估计的无线传感器网络异常检测
2011 IEEE 11th International Conference on Data Mining Pub Date : 2011-12-11 DOI: 10.1109/ICDM.2011.80
Masud Moshtaghi, C. Leckie, S. Karunasekera, J. Bezdek, S. Rajasegarar, M. Palaniswami
{"title":"Incremental Elliptical Boundary Estimation for Anomaly Detection in Wireless Sensor Networks","authors":"Masud Moshtaghi, C. Leckie, S. Karunasekera, J. Bezdek, S. Rajasegarar, M. Palaniswami","doi":"10.1109/ICDM.2011.80","DOIUrl":"https://doi.org/10.1109/ICDM.2011.80","url":null,"abstract":"Wireless Sensor Networks (WSNs) provide a low cost option for gathering spatially dense data from different environments. However, WSNs have limited energy resources that hinder the dissemination of the raw data over the network to a central location. This has stimulated research into efficient data mining approaches, which can exploit the restricted computational capabilities of the sensors to model their normal behavior. Having a normal model of the network, sensors can then forward anomalous measurements to the base station. Most of the current data modeling approaches proposed for WSNs require a fixed offline training period and use batch training in contrast to the real streaming nature of data in these networks. In addition they usually work in stationary environments. In this paper we present an efficient online model construction algorithm that captures the normal behavior of the system. Our model is capable of tracking changes in the data distribution in the monitored environment. We illustrate the proposed algorithm with numerical results on both real-life and simulated data sets, which demonstrate the efficiency and accuracy of our approach compared to existing methods.","PeriodicalId":106216,"journal":{"name":"2011 IEEE 11th International Conference on Data Mining","volume":"276 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114529237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 41
Patent Maintenance Recommendation with Patent Information Network Model 基于专利信息网络模型的专利维护建议
2011 IEEE 11th International Conference on Data Mining Pub Date : 2011-12-11 DOI: 10.1109/ICDM.2011.116
Xin Jin, W. Spangler, Ying Chen, Keke Cai, Rui Ma, Li Zhang, X. Wu, Jiawei Han
{"title":"Patent Maintenance Recommendation with Patent Information Network Model","authors":"Xin Jin, W. Spangler, Ying Chen, Keke Cai, Rui Ma, Li Zhang, X. Wu, Jiawei Han","doi":"10.1109/ICDM.2011.116","DOIUrl":"https://doi.org/10.1109/ICDM.2011.116","url":null,"abstract":"Patents are of crucial importance for businesses, because they provide legal protection for the invented techniques, processes or products. A patent can be held for up to 20 years. However, large maintenance fees need to be paid to keep it enforceable. If the patent is deemed not valuable, the owner may decide to abandon it by stopping paying the maintenance fees to reduce the cost. For large companies or organizations, making such decisions is difficult because too many patents need to be investigated. In this paper, we introduce the new patent mining problem of automatic patent maintenance prediction, and propose a systematic solution to analyze patents for recommending patent maintenance decision. We model the patents as a heterogeneous time-evolving information network and propose new patent features to build model for a ranked prediction on whether to maintain or abandon a patent. In addition, a network-based refinement approach is proposed to further improve the performance. We have conducted experiments on the large scale United States Patent and Trademark Office (USPTO) database which contains over four million granted patents. The results show that our technique can achieve high performance.","PeriodicalId":106216,"journal":{"name":"2011 IEEE 11th International Conference on Data Mining","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115109674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
Minimizing Seed Set for Viral Marketing 最小化病毒式营销种子集
2011 IEEE 11th International Conference on Data Mining Pub Date : 2011-12-11 DOI: 10.1109/ICDM.2011.99
Cheng Long, R. C. Wong
{"title":"Minimizing Seed Set for Viral Marketing","authors":"Cheng Long, R. C. Wong","doi":"10.1109/ICDM.2011.99","DOIUrl":"https://doi.org/10.1109/ICDM.2011.99","url":null,"abstract":"Viral marketing has attracted considerable concerns in recent years due to its novel idea of leveraging the social network to propagate the awareness of products. Specifically, viral marketing is to first target a limited number of users (seeds) in the social network by providing incentives, and these targeted users would then initiate the process of awareness spread by propagating the information to their friends via their social relationships. Extensive studies have been conducted for maximizing the awareness spread given the number of seeds. However, all of them fail to consider the common scenario of viral marketing where companies hope to use as few seeds as possible yet influencing at least a certain number of users. In this paper, we propose a new problem, called J-MIN-Seed, whose objective is to minimize the number of seeds while at least J users are influenced. J-MIN-Seed, unfortunately, is proved to be NP-hard in this work. In such case, we develop a greedy algorithm that can provide error guarantees for J-MIN-Seed. Furthermore, for the problem setting where J is equal to the number of all users in the social network, denoted by Full-Coverage, we design other efficient algorithms. Extensive experiments were conducted on real datasets to verify our algorithm.","PeriodicalId":106216,"journal":{"name":"2011 IEEE 11th International Conference on Data Mining","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128352871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 81
Flexible Fault Tolerant Subspace Clustering for Data with Missing Values 缺失值数据的柔性容错子空间聚类
2011 IEEE 11th International Conference on Data Mining Pub Date : 2011-12-11 DOI: 10.1109/ICDM.2011.70
Stephan Günnemann, Emmanuel Müller, S. Raubach, T. Seidl
{"title":"Flexible Fault Tolerant Subspace Clustering for Data with Missing Values","authors":"Stephan Günnemann, Emmanuel Müller, S. Raubach, T. Seidl","doi":"10.1109/ICDM.2011.70","DOIUrl":"https://doi.org/10.1109/ICDM.2011.70","url":null,"abstract":"In today's applications, data analysis tasks are hindered by many attributes per object as well as by faulty data with missing values. Subspace clustering tackles the challenge of many attributes by cluster detection in any subspace projection of the data. However, it poses novel challenges for handling missing values of objects, which are part of multiple subspace clusters in different projections of the data. In this work, we propose a general fault tolerance definition enhancing subspace clustering models to handle missing values. We introduce a flexible notion of fault tolerance that adapts to the individual characteristics of subspace clusters and ensures a robust parameterization. Allowing missing values in our model increases the computational complexity of subspace clustering. Thus, we prove novel monotonicity properties for an efficient computation of fault tolerant subspace clusters. Experiments on real and synthetic data show that our fault tolerance model yields high quality results even in the presence of many missing values. For repeatability, we provide all datasets and executables on our website.","PeriodicalId":106216,"journal":{"name":"2011 IEEE 11th International Conference on Data Mining","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128821673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
The Joint Inference of Topic Diffusion and Evolution in Social Communities 社会群体中话题扩散与演化的联合推理
2011 IEEE 11th International Conference on Data Mining Pub Date : 2011-12-11 DOI: 10.1109/ICDM.2011.144
C. Lin, Q. Mei, Jiawei Han, Yunliang Jiang, Marina Danilevsky
{"title":"The Joint Inference of Topic Diffusion and Evolution in Social Communities","authors":"C. Lin, Q. Mei, Jiawei Han, Yunliang Jiang, Marina Danilevsky","doi":"10.1109/ICDM.2011.144","DOIUrl":"https://doi.org/10.1109/ICDM.2011.144","url":null,"abstract":"The prevalence of Web 2.0 techniques has led to the boom of various online communities, where topics spread ubiquitously among user-generated documents. Working together with this diffusion process is the evolution of topic content, where novel contents are introduced by documents which adopt the topic. Unlike explicit user behavior (e.g., buying a DVD), both the diffusion paths and the evolutionary process of a topic are implicit, making their discovery challenging. In this paper, we track the evolution of an arbitrary topic and reveal the latent diffusion paths of that topic in a social community. A novel and principled probabilistic model is proposed which casts our task as an joint inference problem, which considers textual documents, social influences, and topic evolution in a unified way. Specifically, a mixture model is introduced to model the generation of text according to the diffusion and the evolution of the topic, while the whole diffusion process is regularized with user-level social influences through a Gaussian Markov Random Field. Experiments on both synthetic data and real world data show that the discovery of topic diffusion and evolution benefits from this joint inference, and the probabilistic model we propose performs significantly better than existing methods.","PeriodicalId":106216,"journal":{"name":"2011 IEEE 11th International Conference on Data Mining","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125616894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 74
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信