2010 IEEE International Conference on Data Mining最新文献

筛选
英文 中文
Accelerating Dynamic Time Warping Subsequence Search with GPUs and FPGAs 利用gpu和fpga加速动态时间翘曲子序列搜索
2010 IEEE International Conference on Data Mining Pub Date : 2010-12-13 DOI: 10.1109/ICDM.2010.21
D. Sart, A. Mueen, W. Najjar, Eamonn J. Keogh, V. Niennattrakul
{"title":"Accelerating Dynamic Time Warping Subsequence Search with GPUs and FPGAs","authors":"D. Sart, A. Mueen, W. Najjar, Eamonn J. Keogh, V. Niennattrakul","doi":"10.1109/ICDM.2010.21","DOIUrl":"https://doi.org/10.1109/ICDM.2010.21","url":null,"abstract":"Many time series data mining problems require subsequence similarity search as a subroutine. Dozens of similarity/distance measures have been proposed in the last decade and there is increasing evidence that Dynamic Time Warping (DTW) is the best measure across a wide range of domains. Given DTW’s usefulness and ubiquity, there has been a large community-wide effort to mitigate its relative lethargy. Proposed speedup techniques include early abandoning strategies, lower-bound based pruning, indexing and embedding. In this work we argue that we are now close to exhausting all possible speedup from software, and that we must turn to hardware-based solutions. With this motivation, we investigate both GPU (Graphics Processing Unit) and FPGA (Field Programmable Gate Array) based acceleration of subsequence similarity search under the DTW measure. As we shall show, our novel algorithms allow GPUs to achieve two orders of magnitude speedup and FPGAs to produce four orders of magnitude speedup. We conduct detailed case studies on the classification of astronomical observations and demonstrate that our ideas allow us to tackle problems that would be untenable otherwise.","PeriodicalId":294061,"journal":{"name":"2010 IEEE International Conference on Data Mining","volume":"508 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122757103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 131
Separation of Interleaved Web Sessions with Heuristic Search 基于启发式搜索的交错Web会话分离
2010 IEEE International Conference on Data Mining Pub Date : 2010-12-13 DOI: 10.1109/ICDM.2010.43
Marko Pozenel, V. Mahnic, M. Kukar
{"title":"Separation of Interleaved Web Sessions with Heuristic Search","authors":"Marko Pozenel, V. Mahnic, M. Kukar","doi":"10.1109/ICDM.2010.43","DOIUrl":"https://doi.org/10.1109/ICDM.2010.43","url":null,"abstract":"We describe a heuristic search-based method for interleaved HTTP (Web) session reconstruction building upon first order Markov models. An interleaved session is generated by a user who is concurrently browsing the same web site in two or more web sessions (browser tabs or windows). In order to assure data quality for subsequent phases in analyzing user's browsing behavior, such sessions need to be separated in advance. We propose a separating process based on best-first search and trained first order Markov chains. We develop a testing method based on various measures of reconstructed sessions similarity to original ones. We evaluate the developed method on two real world click stream data sources: a web shop and a university student records information system. Preliminary results show that the proposed method performs well.","PeriodicalId":294061,"journal":{"name":"2010 IEEE International Conference on Data Mining","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115148132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Scalable Influence Maximization in Social Networks under the Linear Threshold Model 线性阈值模型下社交网络可扩展影响最大化
2010 IEEE International Conference on Data Mining Pub Date : 2010-12-13 DOI: 10.1109/ICDM.2010.118
Wei Chen, Yifei Yuan, Li Zhang
{"title":"Scalable Influence Maximization in Social Networks under the Linear Threshold Model","authors":"Wei Chen, Yifei Yuan, Li Zhang","doi":"10.1109/ICDM.2010.118","DOIUrl":"https://doi.org/10.1109/ICDM.2010.118","url":null,"abstract":"Influence maximization is the problem of finding a small set of most influential nodes in a social network so that their aggregated influence in the network is maximized. In this paper, we study influence maximization in the linear threshold model, one of the important models formalizing the behavior of influence propagation in social networks. We first show that computing exact influence in general networks in the linear threshold model is #P-hard, which closes an open problem left in the seminal work on influence maximization by Kempe, Kleinberg, and Tardos, 2003. As a contrast, we show that computing influence in directed a cyclic graphs (DAGs) can be done in time linear to the size of the graphs. Based on the fast computation in DAGs, we propose the first scalable influence maximization algorithm tailored for the linear threshold model. We conduct extensive simulations to show that our algorithm is scalable to networks with millions of nodes and edges, is orders of magnitude faster than the greedy approximation algorithm proposed by Kempe et al. and its optimized versions, and performs consistently among the best algorithms while other heuristic algorithms not design specifically for the linear threshold model have unstable performances on different real-world networks.","PeriodicalId":294061,"journal":{"name":"2010 IEEE International Conference on Data Mining","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130360910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 878
MoodCast: Emotion Prediction via Dynamic Continuous Factor Graph Model modcast:基于动态连续因子图模型的情绪预测
2010 IEEE International Conference on Data Mining Pub Date : 2010-12-13 DOI: 10.1109/ICDM.2010.105
Yuan Zhang, Jie Tang, Jimeng Sun, Yiran Chen, Jinghai Rao
{"title":"MoodCast: Emotion Prediction via Dynamic Continuous Factor Graph Model","authors":"Yuan Zhang, Jie Tang, Jimeng Sun, Yiran Chen, Jinghai Rao","doi":"10.1109/ICDM.2010.105","DOIUrl":"https://doi.org/10.1109/ICDM.2010.105","url":null,"abstract":"Human emotion is one important underlying force affecting and affected by the dynamics of social networks. An interesting question is “can we predict a person’s mood based on his historic emotion log and his social network?”. In this paper, we propose a Mood Cast method based on a dynamic continuous factor graph model for modeling and predicting users’ emotions in a social network. Mood Cast incorporates users’ dynamic status information (e.g., locations, activities, and attributes) and social influence from users’ friends into a unified model. Based on the historical information (e.g., network structure and users’ status from time 0 to t−1), Mood Cast learns a discriminative model for predicting users’ emotion status at time t. To the best of our knowledge, this work takes the first step in designing a principled model for emotion prediction in social networks. Our experimental results on both real social network and virtual web-based network show that we can accurately predict emotion status of more than 62% of users and 8+% improvement than the baseline methods.","PeriodicalId":294061,"journal":{"name":"2010 IEEE International Conference on Data Mining","volume":"172 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114092577","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 52
A Variance Reduction Framework for Stable Feature Selection 稳定特征选择的方差缩减框架
2010 IEEE International Conference on Data Mining Pub Date : 2010-12-13 DOI: 10.1002/sam.11152
Yue Han, Lei Yu
{"title":"A Variance Reduction Framework for Stable Feature Selection","authors":"Yue Han, Lei Yu","doi":"10.1002/sam.11152","DOIUrl":"https://doi.org/10.1002/sam.11152","url":null,"abstract":"Besides high accuracy, stability of feature selection has recently attracted strong interest in knowledge discovery from high-dimensional data. In this study, we present a theoretical framework about the relationship between the stability and accuracy of feature selection based on a formal bias-variance decomposition of feature selection error. The framework also suggests a variance reduction approach for improving the stability of feature selection algorithms. Furthermore, we propose an empirical variance reduction framework, margin based instance weighting, which weights training instances according to their influence to the estimation of feature relevance. We also develop an efficient algorithm under this framework. Experiments based on synthetic data and real-world micro array data verify both the theoretical framework and the effectiveness of the proposed algorithm on variance reduction. The proposed algorithm is also shown to be effective at improving subset stability, while maintaining comparable classification accuracy based on selected features.","PeriodicalId":294061,"journal":{"name":"2010 IEEE International Conference on Data Mining","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116227791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 71
Transfer Learning via Cluster Correspondence Inference 基于聚类对应推理的迁移学习
2010 IEEE International Conference on Data Mining Pub Date : 2010-12-13 DOI: 10.1109/ICDM.2010.146
Mingsheng Long, Wei-min Cheng, Xiaoming Jin, Jianmin Wang, Dou Shen
{"title":"Transfer Learning via Cluster Correspondence Inference","authors":"Mingsheng Long, Wei-min Cheng, Xiaoming Jin, Jianmin Wang, Dou Shen","doi":"10.1109/ICDM.2010.146","DOIUrl":"https://doi.org/10.1109/ICDM.2010.146","url":null,"abstract":"Transfer learning targets to leverage knowledge from one domain for tasks in a new domain. It finds abundant applications, such as text/sentiment classification. Many previous works are based on cluster analysis, which assume some common clusters shared by both domains. They mainly focus on the one-to-one cluster correspondence to bridge different domains. However, such a correspondence scheme might be too strong for real applications where each cluster in one domain corresponds to many clusters in the other domain. In this paper, we propose a Cluster Correspondence Inference (CCI) method to iteratively infer many-to-many correspondence among clusters from different domains. Specifically, word clusters and document clusters are exploited for each domain using nonnegative matrix factorization, then the word clusters from different domains are corresponded in a many-to-many scheme, with the help of shared word space as a bridge. These two steps are run iteratively and label information is transferred from source domain to target domain through the inferred cluster correspondence. Experiments on various real data sets demonstrate that our method outperforms several state-of-the-art approaches for cross-domain text classification.","PeriodicalId":294061,"journal":{"name":"2010 IEEE International Conference on Data Mining","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123446440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Constraint Based Dimension Correlation and Distance Divergence for Clustering High-Dimensional Data 基于约束的维度关联和距离发散的高维数据聚类
2010 IEEE International Conference on Data Mining Pub Date : 2010-12-13 DOI: 10.1109/ICDM.2010.15
Xianchao Zhang, Yao Wu, Yang Qiu
{"title":"Constraint Based Dimension Correlation and Distance Divergence for Clustering High-Dimensional Data","authors":"Xianchao Zhang, Yao Wu, Yang Qiu","doi":"10.1109/ICDM.2010.15","DOIUrl":"https://doi.org/10.1109/ICDM.2010.15","url":null,"abstract":"Clusters are hidden in subspaces of high dimensional data, i.e., only a subset of features is relevant for each cluster. Subspace clustering is challenging since the search for the relevant features of each cluster and the detection of the final clusters are circular dependent and should be solved simultaneously. In this paper, we point out that feature correlation and distance divergence are important to subspace clustering, but both have not been considered in previous works. Feature correlation groups correlated features independently thus helps to reduce the search space for the relevant features search problem. Distance divergence distinguishes distances on different dimensions and helps to find the final clusters accurately. We tackle the two problems with the aid of a small amount domain knowledge in the form of must-links and cannot-links. We then devise a semi-supervised subspace clustering algorithm CDCDD. CDCDD integrates our solutions of the feature correlation and distance divergence problems, and uses an adaptive dimension voting scheme, which is derived from a previous unsupervised subspace clustering algorithm FINDIT. Experimental results on both synthetic data sets and real data sets show that the proposed CDCDD algorithm outperforms FINDIT in terms of accuracy, and outperforms the other constraint based algorithm SCMINER in terms of both accuracy and efficiency.","PeriodicalId":294061,"journal":{"name":"2010 IEEE International Conference on Data Mining","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124733586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Active Learning from Multiple Noisy Labelers with Varied Costs 具有不同成本的多噪声标注器的主动学习
2010 IEEE International Conference on Data Mining Pub Date : 2010-12-13 DOI: 10.1109/ICDM.2010.147
Yaling Zheng, Stephen Scott, Kun Deng
{"title":"Active Learning from Multiple Noisy Labelers with Varied Costs","authors":"Yaling Zheng, Stephen Scott, Kun Deng","doi":"10.1109/ICDM.2010.147","DOIUrl":"https://doi.org/10.1109/ICDM.2010.147","url":null,"abstract":"In active learning, where a learning algorithm has to purchase the labels of its training examples, it is often assumed that there is only one labeler available to label examples, and that this labeler is noise-free. In reality, it is possible that there are multiple labelers available (such as human labelers in the online annotation tool Amazon Mechanical Turk) and that each such labeler has a different cost and accuracy. We address the active learning problem with multiple labelers where each labeler has a different (known) cost and a different (unknown) accuracy. Our approach uses the idea of {em adjusted cost}, which allows labelers with different costs and accuracies to be directly compared. This allows our algorithm to find low-cost combinations of labelers that result in high-accuracy labelings of instances. Our algorithm further reduces costs by pruning under performing labelers from the set under consideration, and by halting the process of estimating the accuracy of the labelers as early as it can. We found that our algorithm often outperforms, and is always competitive with, other algorithms in the literature.","PeriodicalId":294061,"journal":{"name":"2010 IEEE International Conference on Data Mining","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122707422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 45
Bonsai: Growing Interesting Small Trees 盆景:种植有趣的小树
2010 IEEE International Conference on Data Mining Pub Date : 2010-12-13 DOI: 10.1109/ICDM.2010.86
Stephan Seufert, Srikanta J. Bedathur, Julián Mestre, G. Weikum
{"title":"Bonsai: Growing Interesting Small Trees","authors":"Stephan Seufert, Srikanta J. Bedathur, Julián Mestre, G. Weikum","doi":"10.1109/ICDM.2010.86","DOIUrl":"https://doi.org/10.1109/ICDM.2010.86","url":null,"abstract":"Graphs are increasingly used to model a variety of loosely structured data such as biological or social networks and entity-relationships. Given this profusion of large-scale graph data, efficiently discovering interesting substructures buried within is essential. These substructures are typically used in determining subsequent actions, such as conducting visual analytics by humans or designing expensive biomedical experiments. In such settings, it is often desirable to constrain the size of the discovered results in order to directly control the associated costs. In this paper, we address the problem of finding cardinality-constrained connected sub trees in large node-weighted graphs that maximize the sum of weights of selected nodes. We provide an efficient constant-factor approximation algorithm for this strongly NP-hard problem. Our techniques can be applied in a wide variety of application settings, for example in differential analysis of graphs, a problem that frequently arises in bioinformatics but also has applications on the web.","PeriodicalId":294061,"journal":{"name":"2010 IEEE International Conference on Data Mining","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125391543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Assessing the Significance of Groups in High-Dimensional Data 评估高维数据中组的重要性
2010 IEEE International Conference on Data Mining Pub Date : 2010-12-13 DOI: 10.1109/ICDM.2010.171
G. McLachlan
{"title":"Assessing the Significance of Groups in High-Dimensional Data","authors":"G. McLachlan","doi":"10.1109/ICDM.2010.171","DOIUrl":"https://doi.org/10.1109/ICDM.2010.171","url":null,"abstract":"We consider the problem of assessing the significance of groups in high-dimensional data. In the case of supervised classification where there are data of known origin with respect to the groups under consideration, a guide to the degree of separation among the groups can be given in terms of the estimated error rate of a classifier formed to allocate a new observation to one of the groups. Even in this case with labelled training data, care has to be taken with the estimation of the error rate at least for high-dimensional data to avoid an overly optimistic assessment due to selection biases. In the case of unlabelled data, the problem of assessing whether groups identified from some data mining or cluster analytic procedure are genuine can be quite challenging, in particular for a large number of variables. We shall focus on the use of a resampling approach to this problem applied in conjunction with factor analytic models for the generation of the bootstrap samples under the null hypothesis for the number of groups. The proposed methods are to be demonstrated in their application to some high-dimensional data sets from the bioinformatics literature.","PeriodicalId":294061,"journal":{"name":"2010 IEEE International Conference on Data Mining","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125888032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信