2010 IEEE International Conference on Data Mining最新文献_第10页

How to Do Good Data Mining Research and Get it Published in Top Venues 如何做好数据挖掘研究并将其发表在顶级场所

2010 IEEE International Conference on Data Mining Pub Date : 2010-12-13 DOI: 10.1109/ICDM.2010.165

Eamonn J. Keogh

引用次数: 0

Fast and Flexible Multivariate Time Series Subsequence Search 快速灵活的多元时间序列子序列搜索

2010 IEEE International Conference on Data Mining Pub Date : 2010-12-13 DOI: 10.1109/ICDM.2010.36

Kanishka Bhaduri, Qiang Zhu, N. Oza, A. Srivastava

{"title":"Fast and Flexible Multivariate Time Series Subsequence Search","authors":"Kanishka Bhaduri, Qiang Zhu, N. Oza, A. Srivastava","doi":"10.1109/ICDM.2010.36","DOIUrl":"https://doi.org/10.1109/ICDM.2010.36","url":null,"abstract":"Multivariate Time-Series (MTS) are ubiquitous, and are generated in areas as disparate as sensor recordings in aerospace systems, music and video streams, medical monitoring, and financial systems. Domain experts are often interested in searching for interesting multivariate patterns from these MTS databases which can contain up to several gigabytes of data. Surprisingly, research on MTS search is very limited. Most existing work only supports queries with the same length of data, or queries on a fixed set of variables. In this paper, we propose an efficient and flexible subsequence search framework for massive MTS databases, that, for the first time, enables querying on any subset of variables with arbitrary time delays between them. We propose two provably correct algorithms to solve this problem #x2014; (1) an R*-tree Based Search (RBS) which uses Minimum Bounding Rectangles (MBR) to organize the subsequences, and (2) a List Based Search (LBS) algorithm which uses sorted lists for indexing. We demonstrate the performance of these algorithms using two large MTS databases from the aviation domain, each containing several millions of observations. Both these tests show that our algorithms have very high prune rates (>95%) thus needing actual disk access for only less than 5% of the observations. To the best of our knowledge, this is the first flexible MTS search algorithm capable of subsequence search on any subset of variables. Moreover, MTS subsequence search has never been attempted on datasets of the size we have used in this paper.","PeriodicalId":294061,"journal":{"name":"2010 IEEE International Conference on Data Mining","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130136165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Tru-Alarm: Trustworthiness Analysis of Sensor Networks in Cyber-Physical Systems 真报警:信息物理系统中传感器网络的可信度分析

2010 IEEE International Conference on Data Mining Pub Date : 2010-12-13 DOI: 10.1109/ICDM.2010.63

L. Tang, Xiao Yu, Sangkyum Kim, Jiawei Han, Chih-Chieh Hung, Wen-Chih Peng

引用次数: 95

Learning Markov Network Structure with Decision Trees 用决策树学习马尔可夫网络结构

2010 IEEE International Conference on Data Mining Pub Date : 2010-12-13 DOI: 10.1109/ICDM.2010.128

Daniel Lowd, Jesse Davis

引用次数: 77

Pseudo Conditional Random Fields: Joint Training Approach to Segmenting and Labeling Sequence Data 伪条件随机场:分割和标记序列数据的联合训练方法

2010 IEEE International Conference on Data Mining Pub Date : 2010-12-13 DOI: 10.1109/ICDM.2010.99

Shing-Kit Chan, Wai Lam

{"title":"Pseudo Conditional Random Fields: Joint Training Approach to Segmenting and Labeling Sequence Data","authors":"Shing-Kit Chan, Wai Lam","doi":"10.1109/ICDM.2010.99","DOIUrl":"https://doi.org/10.1109/ICDM.2010.99","url":null,"abstract":"Cascaded approach has been used for a long time to conduct sub-tasks in order to accomplish a major task. We put cascaded approach in a probabilistic framework and analyze possible reasons for cascaded errors. To reduce the occurrence of cascaded errors, we need to add a constraint when performing joint training. We suggest a pseudo Conditional Random Field (pseudo-CRF) approach that models two sub-tasks as two Conditional Random Fields (CRFs). We then present the formulation in the context of a linear chain CRF for solving problems on sequence data. In conducting joint training for a pseudo-CRF, we reuse all existing well-developed efficient inference algorithms for a linear chain CRF, which would otherwise require the use of approximate inference algorithms or simulations that involve long computational time. Our experimental results show an interesting fact that a jointly trained CRF model in a pseudo-CRF may perform worse than a separately trained CRF on a sub-task. However the overall system performance of a pseudo-CRF would outperform that of a cascaded approach. We implement the implicit constraint in the form of a soft constraint such that users can define the penalty cost for violating the constraint. In order to work on large-scale datasets, we further suggest a parallel implementation of the pseudo-CRF approach, which can be implemented on a multi-core CPU or GPU on a graphics card that supports multi-threading. Our experimental results show that it can achieve a 12 times increase in speedup.","PeriodicalId":294061,"journal":{"name":"2010 IEEE International Conference on Data Mining","volume":"275 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133133598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Document Similarity Self-Join with MapReduce 使用MapReduce的文档相似度自连接

2010 IEEE International Conference on Data Mining Pub Date : 2010-12-13 DOI: 10.1109/ICDM.2010.70

R. Baraglia, G. D. F. Morales, C. Lucchese

引用次数: 85

Classifier and Cluster Ensembles for Mining Concept Drifting Data Streams 用于挖掘概念漂移数据流的分类器和聚类集成

2010 IEEE International Conference on Data Mining Pub Date : 2010-12-13 DOI: 10.1109/ICDM.2010.125

Peng Zhang, Xingquan Zhu, Jianlong Tan, Li Guo

{"title":"Classifier and Cluster Ensembles for Mining Concept Drifting Data Streams","authors":"Peng Zhang, Xingquan Zhu, Jianlong Tan, Li Guo","doi":"10.1109/ICDM.2010.125","DOIUrl":"https://doi.org/10.1109/ICDM.2010.125","url":null,"abstract":"Ensemble learning is a commonly used tool for building prediction models from data streams, due to its intrinsic merits of handling large volumes stream data. Despite of its extraordinary successes in stream data mining, existing ensemble models, in stream data environments, mainly fall into the ensemble classifiers category, without realizing that building classifiers requires labor intensive labeling process, and it is often the case that we may have a small number of labeled samples to train a few classifiers, but a large number of unlabeled samples are available to build clusters from data streams. Accordingly, in this paper, we propose a new ensemble model which combines both classifiers and clusters together for mining data streams. We argue that the main challenges of this new ensemble model include (1) clusters formulated from data streams only carry cluster IDs, with no genuine class label information, and (2) concept drifting underlying data streams makes it even harder to combine clusters and classifiers into one ensemble framework. To handle challenge (1), we present a label propagation method to infer each cluster's class label by making full use of both class label information from classifiers, and internal structure information from clusters. To handle challenge (2), we present a new weighting schema to weight all base models according to their consistencies with the up-to-date base model. As a result, all classifiers and clusters can be combined together, through a weighted average mechanism, for prediction. Experiments on real-world data streams demonstrate that our method outperforms simple classifier ensemble and cluster ensemble for stream data mining.","PeriodicalId":294061,"journal":{"name":"2010 IEEE International Conference on Data Mining","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115296842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 92

K-AP: Generating Specified K Clusters by Efficient Affinity Propagation K- ap:通过有效的亲和传播生成指定的K簇

2010 IEEE International Conference on Data Mining Pub Date : 2010-12-13 DOI: 10.1109/ICDM.2010.107

Xiangliang Zhang, Wei Wang, K. Nørvåg, M. Sebag

引用次数: 60

Micro-blogging Sentiment Detection by Collaborative Online Learning 基于协同在线学习的微博情感检测

2010 IEEE International Conference on Data Mining Pub Date : 2010-12-13 DOI: 10.1109/ICDM.2010.139

Guangxia Li, S. Hoi, Kuiyu Chang, R. Jain

引用次数: 52

Personalizing Web Page Recommendation via Collaborative Filtering and Topic-Aware Markov Model 基于协同过滤和主题感知马尔可夫模型的个性化网页推荐

2010 IEEE International Conference on Data Mining Pub Date : 2010-12-13 DOI: 10.1109/ICDM.2010.28

Qingyan Yang, Ju Fan, Jianyong Wang, Lizhu Zhou

引用次数: 41