2010 IEEE International Conference on Data Mining最新文献

筛选
英文 中文
Permutations as Angular Data: Efficient Inference in Factorial Spaces 作为角数据的排列:阶乘空间中的有效推理
2010 IEEE International Conference on Data Mining Pub Date : 2010-12-13 DOI: 10.1109/ICDM.2010.122
S. Plis, T. Lane, V. Calhoun
{"title":"Permutations as Angular Data: Efficient Inference in Factorial Spaces","authors":"S. Plis, T. Lane, V. Calhoun","doi":"10.1109/ICDM.2010.122","DOIUrl":"https://doi.org/10.1109/ICDM.2010.122","url":null,"abstract":"Distributions over permutations arise in applications ranging from multi-object tracking to ranking of instances. The difficulty of dealing with these distributions is caused by the size of their domain, which is factorial in the number of considered entities ($n!$). It makes the direct definition of a multinomial distribution over permutation space impractical for all but a very small $n$. In this work we propose an embedding of all $n!$ permutations for a given $n$ in a surface of a hyper sphere defined in $mathbbm{R}^{(n-1)}$. As a result of the embedding, we acquire ability to define continuous distributions over a hyper sphere with all the benefits of directional statistics. We provide polynomial time projections between the continuous hyper sphere representation and the $n!$-element permutation space. The framework provides a way to use continuous directional probability densities and the methods developed thereof for establishing densities over permutations. As a demonstration of the benefits of the framework we derive an inference procedure for a state-space model over permutations. We demonstrate the approach with simulations on a large number of objects hardly manageable by the state of the art inference methods, and an application to a real flight traffic control dataset.","PeriodicalId":294061,"journal":{"name":"2010 IEEE International Conference on Data Mining","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134370003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Generalized Probabilistic Matrix Factorizations for Collaborative Filtering 协同过滤的广义概率矩阵分解
2010 IEEE International Conference on Data Mining Pub Date : 2010-12-13 DOI: 10.1109/ICDM.2010.116
Hanhuai Shan, A. Banerjee
{"title":"Generalized Probabilistic Matrix Factorizations for Collaborative Filtering","authors":"Hanhuai Shan, A. Banerjee","doi":"10.1109/ICDM.2010.116","DOIUrl":"https://doi.org/10.1109/ICDM.2010.116","url":null,"abstract":"Probabilistic matrix factorization (PMF) methods have shown great promise in collaborative filtering. In this paper, we consider several variants and generalizations of PMF framework inspired by three broad questions: Are the prior distributions used in existing PMF models suitable, or can one get better predictive performance with different priors? Are there suitable extensions to leverage side information? Are there benefits to taking into account row and column biases? We develop new families of PMF models to address these questions along with efficient approximate inference algorithms for learning and prediction. Through extensive experiments on movie recommendation datasets, we illustrate that simpler models directly capturing correlations among latent factors can outperform existing PMF models, side information can benefit prediction accuracy, and accounting for row/column biases leads to improvements in predictive performance.","PeriodicalId":294061,"journal":{"name":"2010 IEEE International Conference on Data Mining","volume":"47 15","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113974166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 162
SONNET: Efficient Approximate Nearest Neighbor Using Multi-core 十四行诗:高效近似近邻使用多核
2010 IEEE International Conference on Data Mining Pub Date : 2010-12-13 DOI: 10.1109/ICDM.2010.157
M. Hasan, Hilmi Yildirim, Abhirup Chakraborty
{"title":"SONNET: Efficient Approximate Nearest Neighbor Using Multi-core","authors":"M. Hasan, Hilmi Yildirim, Abhirup Chakraborty","doi":"10.1109/ICDM.2010.157","DOIUrl":"https://doi.org/10.1109/ICDM.2010.157","url":null,"abstract":"Approximate Nearest Neighbor search over high dimensional data is an important problem with a wide range of practical applications. In this paper, we propose SONNET, a simple multi-core friendly approximate nearest neighbor algorithm that is based on rank aggregation. SONNET is particularly suitable for very high dimensional data, its performance gets better as the dimension increases, whereas the majority of the existing algorithms show a reverse trend. Furthermore, most of the existing algorithms are hard to parallelize either due to the sequential nature of the algorithm or due to the inherent complexity of the algorithm. On the other hand, SONNET has inherent parallelism embedded in the core concept of the algorithm, which earns it almost a linear speed-up as the number of cores increases. Finally, SONNET is very easy to implement and it has an approximation parameter which is intuitively simple.","PeriodicalId":294061,"journal":{"name":"2010 IEEE International Conference on Data Mining","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121391309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Subgroup Discovery Meets Bayesian Networks -- An Exceptional Model Mining Approach 子群发现满足贝叶斯网络——一种特殊的模型挖掘方法
2010 IEEE International Conference on Data Mining Pub Date : 2010-12-13 DOI: 10.1109/ICDM.2010.53
W. Duivesteijn, A. Knobbe, Ad Feelders, Matthijs van Leeuwen
{"title":"Subgroup Discovery Meets Bayesian Networks -- An Exceptional Model Mining Approach","authors":"W. Duivesteijn, A. Knobbe, Ad Feelders, Matthijs van Leeuwen","doi":"10.1109/ICDM.2010.53","DOIUrl":"https://doi.org/10.1109/ICDM.2010.53","url":null,"abstract":"Whenever a dataset has multiple discrete target variables, we want our algorithms to consider not only the variables themselves, but also the interdependencies between them. We propose to use these interdependencies to quantify the quality of subgroups, by integrating Bayesian networks with the Exceptional Model Mining framework. Within this framework, candidate subgroups are generated. For each candidate, we fit a Bayesian network on the target variables. Then we compare the network’s structure to the structure of the Bayesian network fitted on the whole dataset. To perform this comparison, we define an edit distance-based distance metric that is appropriate for Bayesian networks. We show interesting subgroups that we experimentally found with our method on datasets from music theory, semantic scene classification, biology and zoogeography.","PeriodicalId":294061,"journal":{"name":"2010 IEEE International Conference on Data Mining","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128628661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 67
Finding Local Anomalies in Very High Dimensional Space 在非常高维空间中寻找局部异常
2010 IEEE International Conference on Data Mining Pub Date : 2010-12-13 DOI: 10.1109/ICDM.2010.151
T. D. Vries, S. Chawla, M. Houle
{"title":"Finding Local Anomalies in Very High Dimensional Space","authors":"T. D. Vries, S. Chawla, M. Houle","doi":"10.1109/ICDM.2010.151","DOIUrl":"https://doi.org/10.1109/ICDM.2010.151","url":null,"abstract":"Time, cost and energy efficiency are critical factors for many data analysis techniques when the size and dimensionality of data is very large. We investigate the use of Local Outlier Factor (LOF) for data of this type, providing a motivating example from real world data. We propose Projection-Indexed Nearest-Neighbours (PINN), a novel technique that exploits extended nearest neighbour sets in the a reduced dimensional space to create an accurate approximation for k-nearest-neighbour distances, which is used as the core density measurement within LOF. The reduced dimensionality allows for efficient sub-quadratic indexing in the number of items in the data set, where previously only quadratic performance was possible. A detailed theoretical analysis of Random Projection(RP) and PINN shows that we are able to preserve the density of the intrinsic manifold of the data set after projection. Experimental results show that PINN outperforms the standard projection methods RP and PCA when measuring LOF for many high-dimensional real-world data sets of up to 300000 elements and 102600 dimensions.","PeriodicalId":294061,"journal":{"name":"2010 IEEE International Conference on Data Mining","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114266152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 86
Active Spectral Clustering 主动光谱聚类
2010 IEEE International Conference on Data Mining Pub Date : 2010-12-13 DOI: 10.1109/ICDM.2010.119
Xiang Wang, I. Davidson
{"title":"Active Spectral Clustering","authors":"Xiang Wang, I. Davidson","doi":"10.1109/ICDM.2010.119","DOIUrl":"https://doi.org/10.1109/ICDM.2010.119","url":null,"abstract":"The technique of spectral clustering is widely used to segment a range of data from graphs to images. Our work marks a natural progression of spectral clustering from the original passive unsupervised formulation to our active semi-supervised formulation. We follow the widely used area of constrained clustering and allow supervision in the form of pair wise relations between two nodes: Must-Link and Cannot-Link. Unlike most previous constrained clustering work, our constraints are specified incrementally by querying an oracle (domain expert). Since in practice, each query comes with a cost, our goal is to maximally improve the result with as few queries as possible. The advantages of our approach include: 1) it is principled by querying the constraints which maximally reduce the expected error, 2) it can incorporate both hard and soft constraints which are prevalent in practice. We empirically show that our method significantly outperforms the baseline approach, namely constrained spectral clustering with randomly selected constraints, on UCI benchmark data sets.","PeriodicalId":294061,"journal":{"name":"2010 IEEE International Conference on Data Mining","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116081660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 89
Exponential Family Tensor Factorization for Missing-Values Prediction and Anomaly Detection 缺失值预测与异常检测的指数族张量分解
2010 IEEE International Conference on Data Mining Pub Date : 2010-12-13 DOI: 10.1109/ICDM.2010.39
K. Hayashi, Takashi Takenouchi, T. Shibata, Yuki Kamiya, Daishi Kato, Kazuo Kunieda, Keiji Yamada, K. Ikeda
{"title":"Exponential Family Tensor Factorization for Missing-Values Prediction and Anomaly Detection","authors":"K. Hayashi, Takashi Takenouchi, T. Shibata, Yuki Kamiya, Daishi Kato, Kazuo Kunieda, Keiji Yamada, K. Ikeda","doi":"10.1109/ICDM.2010.39","DOIUrl":"https://doi.org/10.1109/ICDM.2010.39","url":null,"abstract":"In this paper, we study probabilistic modeling of heterogeneously attributed multi-dimensional arrays. The model can manage the heterogeneity by employing an individual exponential-family distribution for each attribute of the tensor array. These entries are connected by latent variables and are shared information across the different attributes. Because a Bayesian inference for our model is intractable, we cast the EM algorithm approximated by using the Lap lace method and Gaussian process. This approximation enables us to derive a predictive distribution for missing values in a consistent manner. Simulation experiments show that our method outperforms other methods such as PARAFAC and Tucker decomposition in missing-values prediction for cross-national statistics and is also applicable to discover anomalies in heterogeneous office-logging data.","PeriodicalId":294061,"journal":{"name":"2010 IEEE International Conference on Data Mining","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127591740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Supervised Link Prediction Using Multiple Sources 多源监督链路预测
2010 IEEE International Conference on Data Mining Pub Date : 2010-12-13 DOI: 10.1109/ICDM.2010.112
Zhengdong Lu, Berkant Savas, Wei Tang, I. Dhillon
{"title":"Supervised Link Prediction Using Multiple Sources","authors":"Zhengdong Lu, Berkant Savas, Wei Tang, I. Dhillon","doi":"10.1109/ICDM.2010.112","DOIUrl":"https://doi.org/10.1109/ICDM.2010.112","url":null,"abstract":"Link prediction is a fundamental problem in social network analysis and modern-day commercial applications such as Face book and My space. Most existing research approaches this problem by exploring the topological structure of a social network using only one source of information. However, in many application domains, in addition to the social network of interest, there are a number of auxiliary social networks and/or derived proximity networks available. The contribution of the paper is twofold: (1) a supervised learning framework that can effectively and efficiently learn the dynamics of social networks in the presence of auxiliary networks, (2) a feature design scheme for constructing a rich variety of path-based features using multiple sources, and an effective feature selection strategy based on structured sparsity. Extensive experiments on three real-world collaboration networks show that our model can effectively learn to predict new links using multiple sources, yielding higher prediction accuracy than unsupervised and single-source supervised models.","PeriodicalId":294061,"journal":{"name":"2010 IEEE International Conference on Data Mining","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121858019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 116
An Approach Based on Tree Kernels for Opinion Mining of Online Product Reviews 基于树核的在线产品评论意见挖掘方法
2010 IEEE International Conference on Data Mining Pub Date : 2010-12-13 DOI: 10.1109/ICDM.2010.104
Peng Jiang, Chunxia Zhang, Hongping Fu, Zhendong Niu, Qing Yang
{"title":"An Approach Based on Tree Kernels for Opinion Mining of Online Product Reviews","authors":"Peng Jiang, Chunxia Zhang, Hongping Fu, Zhendong Niu, Qing Yang","doi":"10.1109/ICDM.2010.104","DOIUrl":"https://doi.org/10.1109/ICDM.2010.104","url":null,"abstract":"Opinion mining is a challenging task to identify the opinions or sentiments underlying user generated contents, such as online product reviews, blogs, discussion forums, etc. Previous studies that adopt machine learning algorithms mainly focus on designing effective features for this complex task. This paper presents our approach based on tree kernels for opinion mining of online product reviews. Tree kernels alleviate the complexity of feature selection and generate effective features to satisfy the special requirements in opinion mining. In this paper, we define several tree kernels for sentiment expression extraction and sentiment classification, which are subtasks of opinion mining. Our proposed tree kernels encode not only syntactic structure information, but also sentiment related information, such as sentiment boundary and sentiment polarity, which are important features to opinion mining. Experimental results on a benchmark data set indicate that tree kernels can significantly improve the performance of both sentiment expression extraction and sentiment classification. Besides, a linear combination of our proposed tree kernels and traditional feature vector kernel achieves the best performances using the benchmark data set.","PeriodicalId":294061,"journal":{"name":"2010 IEEE International Conference on Data Mining","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125046867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 39
Exploiting Local Data Uncertainty to Boost Global Outlier Detection 利用局部数据不确定性提高全局异常值检测
2010 IEEE International Conference on Data Mining Pub Date : 2010-12-13 DOI: 10.1109/ICDM.2010.10
Bo Liu, Jie Yin, Yanshan Xiao, Longbing Cao, Philip S. Yu
{"title":"Exploiting Local Data Uncertainty to Boost Global Outlier Detection","authors":"Bo Liu, Jie Yin, Yanshan Xiao, Longbing Cao, Philip S. Yu","doi":"10.1109/ICDM.2010.10","DOIUrl":"https://doi.org/10.1109/ICDM.2010.10","url":null,"abstract":"This paper presents a novel hybrid approach to outlier detection by incorporating local data uncertainty into the construction of a global classifier. To deal with local data uncertainty, we introduce a confidence value to each data example in the training data, which measures the strength of the corresponding class label. Our proposed method works in two steps. Firstly, we generate a pseudo training dataset by computing a confidence value of each input example on its class label. We present two different mechanisms: kernel k-means clustering algorithm and kernel LOF-based algorithm, to compute the confidence values based on the local data behavior. Secondly, we construct a global classifier for outlier detection by generalizing the SVDD-based learning framework to incorporate both positive and negative examples as well as their associated confidence values. By integrating local and global outlier detection, our proposed method explicitly handles the uncertainty of the input data and enhances the ability of SVDD in reducing the sensitivity to noise. Extensive experiments on real life datasets demonstrate that our proposed method can achieve a better tradeoff between detection rate and false alarm rate as compared to four state-of-the-art outlier detection algorithms.","PeriodicalId":294061,"journal":{"name":"2010 IEEE International Conference on Data Mining","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123480680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信