2008 Eighth IEEE International Conference on Data Mining最新文献

筛选
英文 中文
Non-negative Matrix Factorization on Manifold 流形上的非负矩阵分解
2008 Eighth IEEE International Conference on Data Mining Pub Date : 2008-12-15 DOI: 10.1109/ICDM.2008.57
Deng Cai, Xiaofei He, Xiaoyun Wu, Jiawei Han
{"title":"Non-negative Matrix Factorization on Manifold","authors":"Deng Cai, Xiaofei He, Xiaoyun Wu, Jiawei Han","doi":"10.1109/ICDM.2008.57","DOIUrl":"https://doi.org/10.1109/ICDM.2008.57","url":null,"abstract":"Recently non-negative matrix factorization (NMF) has received a lot of attentions in information retrieval, computer vision and pattern recognition. NMF aims to find two non-negative matrices whose product can well approximate the original matrix. The sizes of these two matrices are usually smaller than the original matrix. This results in a compressed version of the original data matrix. The solution of NMF yields a natural parts-based representation for the data. When NMF is applied for data representation, a major disadvantage is that it fails to consider the geometric structure in the data. In this paper, we develop a graph based approach for parts-based data representation in order to overcome this limitation. We construct an affinity graph to encode the geometrical information and seek a matrix factorization which respects the graph structure. We demonstrate the success of this novel algorithm by applying it on real world problems.","PeriodicalId":252958,"journal":{"name":"2008 Eighth IEEE International Conference on Data Mining","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124931016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 391
Balancing Spectral Clustering for Segmenting Spatio-temporal Observations of Multi-agent Systems 基于平衡谱聚类的多智能体系统时空观测数据分割
2008 Eighth IEEE International Conference on Data Mining Pub Date : 2008-12-15 DOI: 10.1109/ICDM.2008.88
B. Takács, Y. Demiris
{"title":"Balancing Spectral Clustering for Segmenting Spatio-temporal Observations of Multi-agent Systems","authors":"B. Takács, Y. Demiris","doi":"10.1109/ICDM.2008.88","DOIUrl":"https://doi.org/10.1109/ICDM.2008.88","url":null,"abstract":"We examine the application of spectral clustering for breaking up the behavior of a multi-agent system in space and time into smaller, independent elements. We cluster observations of individual entities in order to identify significant changes in the parameter space (like spatial position)and detect temporal alterations of behavior within the same framework. Data is also influenced by knowledge about important events. Clusters are pre-processed at each step of the iterative subdivision to make the algorithm invariant against spatial scaling, rotation, replay speed and varying sampling frequency. A method is presented to balance spatial and temporal segmentation based on the expected group size. We demonstrate our results by analyzing the outcomes of a computer game.","PeriodicalId":252958,"journal":{"name":"2008 Eighth IEEE International Conference on Data Mining","volume":"198 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124373888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Clustering Uncertain Data Using Voronoi Diagrams 用Voronoi图聚类不确定数据
2008 Eighth IEEE International Conference on Data Mining Pub Date : 2008-12-15 DOI: 10.1109/ICDM.2008.31
Ben Kao, Sau-dan. Lee, David Wai-Lok Cheung, Wai-Shing Ho, K. F. Chan
{"title":"Clustering Uncertain Data Using Voronoi Diagrams","authors":"Ben Kao, Sau-dan. Lee, David Wai-Lok Cheung, Wai-Shing Ho, K. F. Chan","doi":"10.1109/ICDM.2008.31","DOIUrl":"https://doi.org/10.1109/ICDM.2008.31","url":null,"abstract":"We study the problem of clustering uncertain objects whose locations are described by probability density functions (pdf). We show that the UK-means algorithm, which generalises the k-means algorithm to handle uncertain objects, is very inefficient. The inefficiency comes from the fact that UK-means computes expected distances (ED) between objects and cluster representatives. For arbitrary pdf's, expected distances are computed by numerical integrations, which are costly operations. We propose pruning techniques that are based on Voronoi diagrams to reduce the number of expected distance calculation. These techniques are analytically proven to be more effective than the basic bounding-box-based technique previous known in the literature. We conduct experiments to evaluate the effectiveness of our pruning techniques and to show that our techniques significantly outperform previous methods.","PeriodicalId":252958,"journal":{"name":"2008 Eighth IEEE International Conference on Data Mining","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123541758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 89
Fast and Memory Efficient Mining of High Utility Itemsets in Data Streams 数据流中高实用项集的快速高效挖掘
2008 Eighth IEEE International Conference on Data Mining Pub Date : 2008-12-15 DOI: 10.1109/ICDM.2008.107
Hua-Fu Li, Hsin-Yun Huang, Yi-Cheng Chen, Yu-Jiun Liu, Suh-Yin Lee
{"title":"Fast and Memory Efficient Mining of High Utility Itemsets in Data Streams","authors":"Hua-Fu Li, Hsin-Yun Huang, Yi-Cheng Chen, Yu-Jiun Liu, Suh-Yin Lee","doi":"10.1109/ICDM.2008.107","DOIUrl":"https://doi.org/10.1109/ICDM.2008.107","url":null,"abstract":"Efficient mining of high utility itemsets has become one of the most interesting data mining tasks with broad applications. In this paper, we proposed two efficient one-pass algorithms, MHUI-BIT and MHUI-TID, for mining high utility itemsets from data streams within a transaction-sensitive sliding window. Two effective representations of item information and an extended lexicographical tree-based summary data structure are developed to improve the efficiency of mining high utility itemsets. Experimental results show that the proposed algorithms outperform than the existing algorithms for mining high utility itemsets from data streams.","PeriodicalId":252958,"journal":{"name":"2008 Eighth IEEE International Conference on Data Mining","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130166285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 127
Support Vector Regression for Censored Data (SVRc): A Novel Tool for Survival Analysis 删节数据的支持向量回归(SVRc):一种新的生存分析工具
2008 Eighth IEEE International Conference on Data Mining Pub Date : 2008-12-15 DOI: 10.1109/ICDM.2008.50
F. Khan, V. Zubek
{"title":"Support Vector Regression for Censored Data (SVRc): A Novel Tool for Survival Analysis","authors":"F. Khan, V. Zubek","doi":"10.1109/ICDM.2008.50","DOIUrl":"https://doi.org/10.1109/ICDM.2008.50","url":null,"abstract":"A crucial challenge in predictive modeling for survival analysis is managing censored observations in the data. The Cox proportional hazards model is the standard tool for the analysis of continuous censored survival data. We propose a novel machine learning algorithm, support vector regression for censored data (SVRc) for improved analysis of medical survival data. SVRc leverages the high-dimensional capabilities of traditional SVR while adapting it for use with censored data through a modified asymmetric loss/penalty function which allows censored (left and right censored) data to be processed. We applied the new algorithm to predict the recurrence and disease progression of prostate cancer, breast cancer and lung cancer. Compared with the traditional Cox model, SVRc achieves significant improvement in overall accuracy as well as in the ability to identify high-risk and low-risk patient populations.","PeriodicalId":252958,"journal":{"name":"2008 Eighth IEEE International Conference on Data Mining","volume":"25 10","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114102364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 132
A Hierarchical Algorithm for Clustering Uncertain Data via an Information-Theoretic Approach 一种基于信息论的不确定数据聚类层次算法
2008 Eighth IEEE International Conference on Data Mining Pub Date : 2008-12-15 DOI: 10.1109/ICDM.2008.115
Francesco Gullo, Giovanni Ponti, Andrea Tagarelli, S. Greco
{"title":"A Hierarchical Algorithm for Clustering Uncertain Data via an Information-Theoretic Approach","authors":"Francesco Gullo, Giovanni Ponti, Andrea Tagarelli, S. Greco","doi":"10.1109/ICDM.2008.115","DOIUrl":"https://doi.org/10.1109/ICDM.2008.115","url":null,"abstract":"In recent years there has been a growing interest in clustering uncertain data. In contrast to traditional, \"sharp\" data representation models, uncertain data objects can be represented in terms of an uncertainty region over which a probability density function (pdf) is defined. In this context, the focus has been mainly on partitional and density-based approaches, whereas hierarchical clustering schemes have drawn less attention. We propose a centroid-linkage-based agglomerative hierarchical algorithm for clustering uncertain objects, named U-AHC. The cluster merging criterion is based on an information-theoretic measure to compute the distance between cluster prototypes. These prototypes are represented as mixture densities that summarize the pdfs of all the uncertain objects in the clusters. Experiments have shown that our method outperforms state-of-the-art clustering algorithms from an accuracy viewpoint while achieving reasonably good efficiency.","PeriodicalId":252958,"journal":{"name":"2008 Eighth IEEE International Conference on Data Mining","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114293614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
Isolation Forest 与世隔绝的森林
2008 Eighth IEEE International Conference on Data Mining Pub Date : 2008-12-15 DOI: 10.1109/ICDM.2008.17
Fei Tony Liu, K. Ting, Zhi-Hua Zhou
{"title":"Isolation Forest","authors":"Fei Tony Liu, K. Ting, Zhi-Hua Zhou","doi":"10.1109/ICDM.2008.17","DOIUrl":"https://doi.org/10.1109/ICDM.2008.17","url":null,"abstract":"Most existing model-based approaches to anomaly detection construct a profile of normal instances, then identify instances that do not conform to the normal profile as anomalies. This paper proposes a fundamentally different model-based method that explicitly isolates anomalies instead of profiles normal points. To our best knowledge, the concept of isolation has not been explored in current literature. The use of isolation enables the proposed method, iForest, to exploit sub-sampling to an extent that is not feasible in existing methods, creating an algorithm which has a linear time complexity with a low constant and a low memory requirement. Our empirical evaluation shows that iForest performs favourably to ORCA, a near-linear time complexity distance-based method, LOF and random forests in terms of AUC and processing time, and especially in large data sets. iForest also works well in high dimensional problems which have a large number of irrelevant attributes, and in situations where training set does not contain any anomalies.","PeriodicalId":252958,"journal":{"name":"2008 Eighth IEEE International Conference on Data Mining","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129345191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3344
Why Stacked Models Perform Effective Collective Classification 为什么堆叠模型能有效地进行集体分类
2008 Eighth IEEE International Conference on Data Mining Pub Date : 2008-12-15 DOI: 10.1109/ICDM.2008.126
A. Fast, David D. Jensen
{"title":"Why Stacked Models Perform Effective Collective Classification","authors":"A. Fast, David D. Jensen","doi":"10.1109/ICDM.2008.126","DOIUrl":"https://doi.org/10.1109/ICDM.2008.126","url":null,"abstract":"Collective classification techniques jointly infer all class labels of a relational data set, using the inferences about one class label to influence inferences about related class labels. Kou and Cohen recently introduced an efficient relational model based on stacking that, despite its simplicity, has equivalent accuracy to more sophisticated joint inference approaches. Using experiments on both real and synthetic data, we show that the primary cause for the performance of the stacked model is the reduction in bias from learning the stacked model on inferred labels rather than true labels. The reduction in variance due to conditional inference also contributes to the effect but it is not as strong. In addition, we show that the performance of the joint inference and stacked learners can be attributed to an implicit weighting of local and relational features at learning time.","PeriodicalId":252958,"journal":{"name":"2008 Eighth IEEE International Conference on Data Mining","volume":"101 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126961298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
On-line LDA: Adaptive Topic Models for Mining Text Streams with Applications to Topic Detection and Tracking 在线LDA:挖掘文本流的自适应主题模型及其在主题检测和跟踪中的应用
2008 Eighth IEEE International Conference on Data Mining Pub Date : 2008-12-15 DOI: 10.1109/ICDM.2008.140
Loulwah AlSumait, Daniel Barbará, C. Domeniconi
{"title":"On-line LDA: Adaptive Topic Models for Mining Text Streams with Applications to Topic Detection and Tracking","authors":"Loulwah AlSumait, Daniel Barbará, C. Domeniconi","doi":"10.1109/ICDM.2008.140","DOIUrl":"https://doi.org/10.1109/ICDM.2008.140","url":null,"abstract":"This paper presents online topic model (OLDA), a topic model that automatically captures the thematic patterns and identifies emerging topics of text streams and their changes over time. Our approach allows the topic modeling framework, specifically the latent Dirichlet allocation (LDA) model, to work in an online fashion such that it incrementally builds an up-to-date model (mixture of topics per document and mixture of words per topic) when a new document (or a set of documents) appears. A solution based on the empirical Bayes method is proposed. The idea is to incrementally update the current model according to the information inferred from the new stream of data with no need to access previous data. The dynamics of the proposed approach also provide an efficient mean to track the topics over time and detect the emerging topics in real time. Our method is evaluated both qualitatively and quantitatively using benchmark datasets. In our experiments, the OLDA has discovered interesting patterns by just analyzing a fraction of data at a time. Our tests also prove the ability of OLDA to align the topics across the epochs with which the evolution of the topics over time is captured. The OLDA is also comparable to, and sometimes better than, the original LDA in predicting the likelihood of unseen documents.","PeriodicalId":252958,"journal":{"name":"2008 Eighth IEEE International Conference on Data Mining","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121860844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 455
A Novel Method of Combined Feature Extraction for Recognition 一种新的组合特征提取识别方法
2008 Eighth IEEE International Conference on Data Mining Pub Date : 2008-12-15 DOI: 10.1109/ICDM.2008.28
Tingkai Sun, Songcan Chen, Jing-yu Yang, P. Shi
{"title":"A Novel Method of Combined Feature Extraction for Recognition","authors":"Tingkai Sun, Songcan Chen, Jing-yu Yang, P. Shi","doi":"10.1109/ICDM.2008.28","DOIUrl":"https://doi.org/10.1109/ICDM.2008.28","url":null,"abstract":"Multimodal recognition is an emerging technique to overcome the non-robustness of the unimodal recognition in real applications. Canonical correlation analysis (CCA) has been employed as a powerful tool for feature fusion in the realization of such multimodal system. However, CCA is the unsupervised feature extraction and it does not utilize the class information of the samples, resulting in the constraint of the recognition performance. In this paper, the class information is incorporated into the framework of CCA for combined feature extraction, and a novel method of combined feature extraction for multimodal recognition, called discriminative canonical correlation analysis (DCCA), is proposed. The experiments show that DCCA outperforms some related methods of both unimodal recognition and multimodal recognition.","PeriodicalId":252958,"journal":{"name":"2008 Eighth IEEE International Conference on Data Mining","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122020728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 127
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信