2010 IEEE International Conference on Data Mining最新文献_第3页

Accelerating Dynamic Time Warping Subsequence Search with GPUs and FPGAs 利用gpu和fpga加速动态时间翘曲子序列搜索

2010 IEEE International Conference on Data Mining Pub Date : 2010-12-13 DOI: 10.1109/ICDM.2010.21

D. Sart, A. Mueen, W. Najjar, Eamonn J. Keogh, V. Niennattrakul

{"title":"Accelerating Dynamic Time Warping Subsequence Search with GPUs and FPGAs","authors":"D. Sart, A. Mueen, W. Najjar, Eamonn J. Keogh, V. Niennattrakul","doi":"10.1109/ICDM.2010.21","DOIUrl":"https://doi.org/10.1109/ICDM.2010.21","url":null,"abstract":"Many time series data mining problems require subsequence similarity search as a subroutine. Dozens of similarity/distance measures have been proposed in the last decade and there is increasing evidence that Dynamic Time Warping (DTW) is the best measure across a wide range of domains. Given DTW’s usefulness and ubiquity, there has been a large community-wide effort to mitigate its relative lethargy. Proposed speedup techniques include early abandoning strategies, lower-bound based pruning, indexing and embedding. In this work we argue that we are now close to exhausting all possible speedup from software, and that we must turn to hardware-based solutions. With this motivation, we investigate both GPU (Graphics Processing Unit) and FPGA (Field Programmable Gate Array) based acceleration of subsequence similarity search under the DTW measure. As we shall show, our novel algorithms allow GPUs to achieve two orders of magnitude speedup and FPGAs to produce four orders of magnitude speedup. We conduct detailed case studies on the classification of astronomical observations and demonstrate that our ideas allow us to tackle problems that would be untenable otherwise.","PeriodicalId":294061,"journal":{"name":"2010 IEEE International Conference on Data Mining","volume":"508 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122757103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 131

Separation of Interleaved Web Sessions with Heuristic Search 基于启发式搜索的交错Web会话分离

2010 IEEE International Conference on Data Mining Pub Date : 2010-12-13 DOI: 10.1109/ICDM.2010.43

Marko Pozenel, V. Mahnic, M. Kukar

引用次数: 8

Scalable Influence Maximization in Social Networks under the Linear Threshold Model 线性阈值模型下社交网络可扩展影响最大化

2010 IEEE International Conference on Data Mining Pub Date : 2010-12-13 DOI: 10.1109/ICDM.2010.118

Wei Chen, Yifei Yuan, Li Zhang

{"title":"Scalable Influence Maximization in Social Networks under the Linear Threshold Model","authors":"Wei Chen, Yifei Yuan, Li Zhang","doi":"10.1109/ICDM.2010.118","DOIUrl":"https://doi.org/10.1109/ICDM.2010.118","url":null,"abstract":"Influence maximization is the problem of finding a small set of most influential nodes in a social network so that their aggregated influence in the network is maximized. In this paper, we study influence maximization in the linear threshold model, one of the important models formalizing the behavior of influence propagation in social networks. We first show that computing exact influence in general networks in the linear threshold model is #P-hard, which closes an open problem left in the seminal work on influence maximization by Kempe, Kleinberg, and Tardos, 2003. As a contrast, we show that computing influence in directed a cyclic graphs (DAGs) can be done in time linear to the size of the graphs. Based on the fast computation in DAGs, we propose the first scalable influence maximization algorithm tailored for the linear threshold model. We conduct extensive simulations to show that our algorithm is scalable to networks with millions of nodes and edges, is orders of magnitude faster than the greedy approximation algorithm proposed by Kempe et al. and its optimized versions, and performs consistently among the best algorithms while other heuristic algorithms not design specifically for the linear threshold model have unstable performances on different real-world networks.","PeriodicalId":294061,"journal":{"name":"2010 IEEE International Conference on Data Mining","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130360910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 878

MoodCast: Emotion Prediction via Dynamic Continuous Factor Graph Model modcast:基于动态连续因子图模型的情绪预测

2010 IEEE International Conference on Data Mining Pub Date : 2010-12-13 DOI: 10.1109/ICDM.2010.105

Yuan Zhang, Jie Tang, Jimeng Sun, Yiran Chen, Jinghai Rao

引用次数: 52

A Variance Reduction Framework for Stable Feature Selection 稳定特征选择的方差缩减框架

2010 IEEE International Conference on Data Mining Pub Date : 2010-12-13 DOI: 10.1002/sam.11152

Yue Han, Lei Yu

引用次数: 71

Transfer Learning via Cluster Correspondence Inference 基于聚类对应推理的迁移学习

2010 IEEE International Conference on Data Mining Pub Date : 2010-12-13 DOI: 10.1109/ICDM.2010.146

Mingsheng Long, Wei-min Cheng, Xiaoming Jin, Jianmin Wang, Dou Shen

{"title":"Transfer Learning via Cluster Correspondence Inference","authors":"Mingsheng Long, Wei-min Cheng, Xiaoming Jin, Jianmin Wang, Dou Shen","doi":"10.1109/ICDM.2010.146","DOIUrl":"https://doi.org/10.1109/ICDM.2010.146","url":null,"abstract":"Transfer learning targets to leverage knowledge from one domain for tasks in a new domain. It finds abundant applications, such as text/sentiment classification. Many previous works are based on cluster analysis, which assume some common clusters shared by both domains. They mainly focus on the one-to-one cluster correspondence to bridge different domains. However, such a correspondence scheme might be too strong for real applications where each cluster in one domain corresponds to many clusters in the other domain. In this paper, we propose a Cluster Correspondence Inference (CCI) method to iteratively infer many-to-many correspondence among clusters from different domains. Specifically, word clusters and document clusters are exploited for each domain using nonnegative matrix factorization, then the word clusters from different domains are corresponded in a many-to-many scheme, with the help of shared word space as a bridge. These two steps are run iteratively and label information is transferred from source domain to target domain through the inferred cluster correspondence. Experiments on various real data sets demonstrate that our method outperforms several state-of-the-art approaches for cross-domain text classification.","PeriodicalId":294061,"journal":{"name":"2010 IEEE International Conference on Data Mining","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123446440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Constraint Based Dimension Correlation and Distance Divergence for Clustering High-Dimensional Data 基于约束的维度关联和距离发散的高维数据聚类

2010 IEEE International Conference on Data Mining Pub Date : 2010-12-13 DOI: 10.1109/ICDM.2010.15

Xianchao Zhang, Yao Wu, Yang Qiu

{"title":"Constraint Based Dimension Correlation and Distance Divergence for Clustering High-Dimensional Data","authors":"Xianchao Zhang, Yao Wu, Yang Qiu","doi":"10.1109/ICDM.2010.15","DOIUrl":"https://doi.org/10.1109/ICDM.2010.15","url":null,"abstract":"Clusters are hidden in subspaces of high dimensional data, i.e., only a subset of features is relevant for each cluster. Subspace clustering is challenging since the search for the relevant features of each cluster and the detection of the final clusters are circular dependent and should be solved simultaneously. In this paper, we point out that feature correlation and distance divergence are important to subspace clustering, but both have not been considered in previous works. Feature correlation groups correlated features independently thus helps to reduce the search space for the relevant features search problem. Distance divergence distinguishes distances on different dimensions and helps to find the final clusters accurately. We tackle the two problems with the aid of a small amount domain knowledge in the form of must-links and cannot-links. We then devise a semi-supervised subspace clustering algorithm CDCDD. CDCDD integrates our solutions of the feature correlation and distance divergence problems, and uses an adaptive dimension voting scheme, which is derived from a previous unsupervised subspace clustering algorithm FINDIT. Experimental results on both synthetic data sets and real data sets show that the proposed CDCDD algorithm outperforms FINDIT in terms of accuracy, and outperforms the other constraint based algorithm SCMINER in terms of both accuracy and efficiency.","PeriodicalId":294061,"journal":{"name":"2010 IEEE International Conference on Data Mining","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124733586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Active Learning from Multiple Noisy Labelers with Varied Costs 具有不同成本的多噪声标注器的主动学习

2010 IEEE International Conference on Data Mining Pub Date : 2010-12-13 DOI: 10.1109/ICDM.2010.147

Yaling Zheng, Stephen Scott, Kun Deng

引用次数: 45

Bonsai: Growing Interesting Small Trees 盆景:种植有趣的小树

2010 IEEE International Conference on Data Mining Pub Date : 2010-12-13 DOI: 10.1109/ICDM.2010.86

Stephan Seufert, Srikanta J. Bedathur, Julián Mestre, G. Weikum

引用次数: 7

Assessing the Significance of Groups in High-Dimensional Data 评估高维数据中组的重要性

2010 IEEE International Conference on Data Mining Pub Date : 2010-12-13 DOI: 10.1109/ICDM.2010.171

G. McLachlan

引用次数: 1