2011 IEEE 11th International Conference on Data Mining最新文献_第9页

Detection of Cross-Channel Anomalies from Multiple Data Channels 多数据通道跨通道异常的检测

2011 IEEE 11th International Conference on Data Mining Pub Date : 2011-12-11 DOI: 10.1109/ICDM.2011.51

Duc-Son Pham, Budhaditya Saha, Dinh Q. Phung, S. Venkatesh

引用次数: 7

How Does Research Evolve? Pattern Mining for Research Meme Cycles 研究是如何进化的?模因循环研究的模式挖掘

2011 IEEE 11th International Conference on Data Mining Pub Date : 2011-12-11 DOI: 10.1109/ICDM.2011.76

Dan He, Xingquan Zhu, D. S. Parker

引用次数: 5

Maximum Entropy Modelling for Assessing Results on Real-Valued Data 实值数据结果评估的最大熵模型

2011 IEEE 11th International Conference on Data Mining Pub Date : 2011-12-11 DOI: 10.1109/ICDM.2011.98

Kleanthis-Nikolaos Kontonasios, Jilles Vreeken, T. D. Bie

{"title":"Maximum Entropy Modelling for Assessing Results on Real-Valued Data","authors":"Kleanthis-Nikolaos Kontonasios, Jilles Vreeken, T. D. Bie","doi":"10.1109/ICDM.2011.98","DOIUrl":"https://doi.org/10.1109/ICDM.2011.98","url":null,"abstract":"Statistical assessment of the results of data mining is increasingly recognised as a core task in the knowledge discovery process. It is of key importance in practice, as results that might seem interesting at first glance can often be explained by well-known basic properties of the data. In pattern mining, for instance, such trivial results can be so overwhelming in number that filtering them out is a necessity in order to identify the truly interesting patterns. In this paper, we propose an approach for assessing results on real-valued rectangular databases. More specifically, using our analytical model we are able to statistically assess whether or not a discovered structure may be the trivial result of the row and column marginal distributions in the database. Our main approach is to use the Maximum Entropy principle to fit a background model to the data while respecting its marginal distributions. To find these distributions, we employ an MDL based histogram estimator, and we fit these in our model using efficient convex optimization techniques. Subsequently, our model can be used to calculate probabilities directly, as well as to efficiently sample data with the purpose of assessing results by means of empirical hypothesis testing. Notably, our approach is efficient, parameter-free, and naturally deals with missing values. As such, it represents a well-founded alternative to swap randomisation","PeriodicalId":106216,"journal":{"name":"2011 IEEE 11th International Conference on Data Mining","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126330185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

ASAP: A Self-Adaptive Prediction System for Instant Cloud Resource Demand Provisioning ASAP:一个即时云资源需求预置的自适应预测系统

2011 IEEE 11th International Conference on Data Mining Pub Date : 2011-12-11 DOI: 10.1109/ICDM.2011.25

Yexi Jiang, Chang-Shing Perng, Tao Li, Rong N. Chang

引用次数: 83

Finding Robust Itemsets under Subsampling 寻找子抽样下的鲁棒项集

2011 IEEE 11th International Conference on Data Mining Pub Date : 2011-12-11 DOI: 10.1145/2656261

Nikolaj Tatti, Fabian Moerchen

{"title":"Finding Robust Itemsets under Subsampling","authors":"Nikolaj Tatti, Fabian Moerchen","doi":"10.1145/2656261","DOIUrl":"https://doi.org/10.1145/2656261","url":null,"abstract":"Mining frequent patterns is plagued by the problem of pattern explosion making pattern reduction techniques a key challenge in pattern mining. In this paper we propose a novel theoretical framework for pattern reduction. We do this by measuring the robustness of a property of an item set such as closed ness or non-derivability. The robustness of a property is the probability that this property holds on random subsets of the original data. We study four properties: closed, free, non-derivable and totally shattered item sets, demonstrating how we can compute the robustness analytically without actually sampling the data. Our concept of robustness has many advantages: Unlike statistical approaches for reducing patterns, we do not assume a null hypothesis or any noise model and the patterns reported are simply a subset of all patterns with this property as opposed to approximate patterns for which the property does not really hold. If the underlying property is monotonic, then the measure is also monotonic, allowing us to efficiently mine robust item sets. We further derive a parameter-free technique for ranking item sets that can be used for top-k approaches. Our experiments demonstrate that we can successfully use the robustness measure to reduce the number of patterns and that ranking yields interesting itemsets.","PeriodicalId":106216,"journal":{"name":"2011 IEEE 11th International Conference on Data Mining","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124922907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17

Incremental Elliptical Boundary Estimation for Anomaly Detection in Wireless Sensor Networks 基于增量椭圆边界估计的无线传感器网络异常检测

2011 IEEE 11th International Conference on Data Mining Pub Date : 2011-12-11 DOI: 10.1109/ICDM.2011.80

Masud Moshtaghi, C. Leckie, S. Karunasekera, J. Bezdek, S. Rajasegarar, M. Palaniswami

{"title":"Incremental Elliptical Boundary Estimation for Anomaly Detection in Wireless Sensor Networks","authors":"Masud Moshtaghi, C. Leckie, S. Karunasekera, J. Bezdek, S. Rajasegarar, M. Palaniswami","doi":"10.1109/ICDM.2011.80","DOIUrl":"https://doi.org/10.1109/ICDM.2011.80","url":null,"abstract":"Wireless Sensor Networks (WSNs) provide a low cost option for gathering spatially dense data from different environments. However, WSNs have limited energy resources that hinder the dissemination of the raw data over the network to a central location. This has stimulated research into efficient data mining approaches, which can exploit the restricted computational capabilities of the sensors to model their normal behavior. Having a normal model of the network, sensors can then forward anomalous measurements to the base station. Most of the current data modeling approaches proposed for WSNs require a fixed offline training period and use batch training in contrast to the real streaming nature of data in these networks. In addition they usually work in stationary environments. In this paper we present an efficient online model construction algorithm that captures the normal behavior of the system. Our model is capable of tracking changes in the data distribution in the monitored environment. We illustrate the proposed algorithm with numerical results on both real-life and simulated data sets, which demonstrate the efficiency and accuracy of our approach compared to existing methods.","PeriodicalId":106216,"journal":{"name":"2011 IEEE 11th International Conference on Data Mining","volume":"276 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114529237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 41

Patent Maintenance Recommendation with Patent Information Network Model 基于专利信息网络模型的专利维护建议

2011 IEEE 11th International Conference on Data Mining Pub Date : 2011-12-11 DOI: 10.1109/ICDM.2011.116

Xin Jin, W. Spangler, Ying Chen, Keke Cai, Rui Ma, Li Zhang, X. Wu, Jiawei Han

{"title":"Patent Maintenance Recommendation with Patent Information Network Model","authors":"Xin Jin, W. Spangler, Ying Chen, Keke Cai, Rui Ma, Li Zhang, X. Wu, Jiawei Han","doi":"10.1109/ICDM.2011.116","DOIUrl":"https://doi.org/10.1109/ICDM.2011.116","url":null,"abstract":"Patents are of crucial importance for businesses, because they provide legal protection for the invented techniques, processes or products. A patent can be held for up to 20 years. However, large maintenance fees need to be paid to keep it enforceable. If the patent is deemed not valuable, the owner may decide to abandon it by stopping paying the maintenance fees to reduce the cost. For large companies or organizations, making such decisions is difficult because too many patents need to be investigated. In this paper, we introduce the new patent mining problem of automatic patent maintenance prediction, and propose a systematic solution to analyze patents for recommending patent maintenance decision. We model the patents as a heterogeneous time-evolving information network and propose new patent features to build model for a ranked prediction on whether to maintain or abandon a patent. In addition, a network-based refinement approach is proposed to further improve the performance. We have conducted experiments on the large scale United States Patent and Trademark Office (USPTO) database which contains over four million granted patents. The results show that our technique can achieve high performance.","PeriodicalId":106216,"journal":{"name":"2011 IEEE 11th International Conference on Data Mining","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115109674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 33

Minimizing Seed Set for Viral Marketing 最小化病毒式营销种子集

2011 IEEE 11th International Conference on Data Mining Pub Date : 2011-12-11 DOI: 10.1109/ICDM.2011.99

Cheng Long, R. C. Wong

{"title":"Minimizing Seed Set for Viral Marketing","authors":"Cheng Long, R. C. Wong","doi":"10.1109/ICDM.2011.99","DOIUrl":"https://doi.org/10.1109/ICDM.2011.99","url":null,"abstract":"Viral marketing has attracted considerable concerns in recent years due to its novel idea of leveraging the social network to propagate the awareness of products. Specifically, viral marketing is to first target a limited number of users (seeds) in the social network by providing incentives, and these targeted users would then initiate the process of awareness spread by propagating the information to their friends via their social relationships. Extensive studies have been conducted for maximizing the awareness spread given the number of seeds. However, all of them fail to consider the common scenario of viral marketing where companies hope to use as few seeds as possible yet influencing at least a certain number of users. In this paper, we propose a new problem, called J-MIN-Seed, whose objective is to minimize the number of seeds while at least J users are influenced. J-MIN-Seed, unfortunately, is proved to be NP-hard in this work. In such case, we develop a greedy algorithm that can provide error guarantees for J-MIN-Seed. Furthermore, for the problem setting where J is equal to the number of all users in the social network, denoted by Full-Coverage, we design other efficient algorithms. Extensive experiments were conducted on real datasets to verify our algorithm.","PeriodicalId":106216,"journal":{"name":"2011 IEEE 11th International Conference on Data Mining","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128352871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 81

Flexible Fault Tolerant Subspace Clustering for Data with Missing Values 缺失值数据的柔性容错子空间聚类

2011 IEEE 11th International Conference on Data Mining Pub Date : 2011-12-11 DOI: 10.1109/ICDM.2011.70

Stephan Günnemann, Emmanuel Müller, S. Raubach, T. Seidl

引用次数: 22

The Joint Inference of Topic Diffusion and Evolution in Social Communities 社会群体中话题扩散与演化的联合推理

2011 IEEE 11th International Conference on Data Mining Pub Date : 2011-12-11 DOI: 10.1109/ICDM.2011.144

C. Lin, Q. Mei, Jiawei Han, Yunliang Jiang, Marina Danilevsky

{"title":"The Joint Inference of Topic Diffusion and Evolution in Social Communities","authors":"C. Lin, Q. Mei, Jiawei Han, Yunliang Jiang, Marina Danilevsky","doi":"10.1109/ICDM.2011.144","DOIUrl":"https://doi.org/10.1109/ICDM.2011.144","url":null,"abstract":"The prevalence of Web 2.0 techniques has led to the boom of various online communities, where topics spread ubiquitously among user-generated documents. Working together with this diffusion process is the evolution of topic content, where novel contents are introduced by documents which adopt the topic. Unlike explicit user behavior (e.g., buying a DVD), both the diffusion paths and the evolutionary process of a topic are implicit, making their discovery challenging. In this paper, we track the evolution of an arbitrary topic and reveal the latent diffusion paths of that topic in a social community. A novel and principled probabilistic model is proposed which casts our task as an joint inference problem, which considers textual documents, social influences, and topic evolution in a unified way. Specifically, a mixture model is introduced to model the generation of text according to the diffusion and the evolution of the topic, while the whole diffusion process is regularized with user-level social influences through a Gaussian Markov Random Field. Experiments on both synthetic data and real world data show that the discovery of topic diffusion and evolution benefits from this joint inference, and the probabilistic model we propose performs significantly better than existing methods.","PeriodicalId":106216,"journal":{"name":"2011 IEEE 11th International Conference on Data Mining","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125616894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 74