2011 IEEE 11th International Conference on Data Mining最新文献_第6页

Positive and Unlabeled Learning for Graph Classification 图分类的积极和无标记学习

2011 IEEE 11th International Conference on Data Mining Pub Date : 2011-12-11 DOI: 10.1109/ICDM.2011.119

Yuchen Zhao, Xiangnan Kong, Philip S. Yu

{"title":"Positive and Unlabeled Learning for Graph Classification","authors":"Yuchen Zhao, Xiangnan Kong, Philip S. Yu","doi":"10.1109/ICDM.2011.119","DOIUrl":"https://doi.org/10.1109/ICDM.2011.119","url":null,"abstract":"The problem of graph classification has drawn much attention in the last decade. Conventional approaches on graph classification focus on mining discriminative sub graph features under supervised settings. The feature selection strategies strictly follow the assumption that both positive and negative graphs exist. However, in many real-world applications, the negative graph examples are not available. In this paper we study the problem of how to select useful sub graph features and perform graph classification based upon only positive and unlabeled graphs. This problem is challenging and different from previous works on PU learning, because there are no predefined features in graph data. Moreover, the sub graph enumeration problem is NP-hard. We need to identify a subset of unlabeled graphs that are most likely to be negative graphs. However, the negative graph selection problem and the sub graph feature selection problem are correlated. Before the reliable negative graphs can be resolved, we need to have a set of useful sub graph features. In order to address this problem, we first derive an evaluation criterion to estimate the dependency between sub graph features and class labels based on a set of estimated negative graphs. In order to build accurate models for the PU learning problem on graph data, we propose an integrated approach to concurrently select the discriminative features and the negative graphs in an iterative manner. Experimental results illustrate the effectiveness and efficiency of the proposed method.","PeriodicalId":106216,"journal":{"name":"2011 IEEE 11th International Conference on Data Mining","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121386702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 43

Enabling Fast Lazy Learning for Data Streams 支持数据流的快速惰性学习

2011 IEEE 11th International Conference on Data Mining Pub Date : 2011-12-11 DOI: 10.1109/ICDM.2011.63

Peng Zhang, Byron J. Gao, Xingquan Zhu, Li Guo

{"title":"Enabling Fast Lazy Learning for Data Streams","authors":"Peng Zhang, Byron J. Gao, Xingquan Zhu, Li Guo","doi":"10.1109/ICDM.2011.63","DOIUrl":"https://doi.org/10.1109/ICDM.2011.63","url":null,"abstract":"Lazy learning, such as k-nearest neighbor learning, has been widely applied to many applications. Known for well capturing data locality, lazy learning can be advantageous for highly dynamic and complex learning environments such as data streams. Yet its high memory consumption and low prediction efficiency have made it less favorable for stream oriented applications. Specifically, traditional lazy learning stores all the training data and the inductive process is deferred until a query appears, whereas in stream applications, data records flow continuously in large volumes and the prediction of class labels needs to be made in a timely manner. In this paper, we provide a systematic solution that overcomes the memory and efficiency limitations and enables fast lazy learning for concept drifting data streams. In particular, we propose a novel Lazy-tree (Ltree for short) indexing structure that dynamically maintains compact high-level summaries of historical stream records. L-trees are M-Tree [5] like, height-balanced, and can help achieve great memory consumption reduction and sub-linear time complexity for prediction. Moreover, L-trees continuously absorb new stream records and discard outdated ones, so they can naturally adapt to the dynamically changing concepts in data streams for accurate prediction. Extensive experiments on real-world and synthetic data streams demonstrate the performance of our approach.","PeriodicalId":106216,"journal":{"name":"2011 IEEE 11th International Conference on Data Mining","volume":"198 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115838096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 49

Semi-supervised Discriminant Hashing 半监督判别哈希

2011 IEEE 11th International Conference on Data Mining Pub Date : 2011-12-11 DOI: 10.1109/ICDM.2011.128

Saehoon Kim, Seungjin Choi

{"title":"Semi-supervised Discriminant Hashing","authors":"Saehoon Kim, Seungjin Choi","doi":"10.1109/ICDM.2011.128","DOIUrl":"https://doi.org/10.1109/ICDM.2011.128","url":null,"abstract":"Hashing refers to methods for embedding high dimensional data into a similarity-preserving low-dimensional Hamming space such that similar objects are indexed by binary codes whose Hamming distances are small. Learning hash functions from data has recently been recognized as a promising approach to approximate nearest neighbor search for high dimensional data. Most of ¡®learning to hash' methods resort to either unsupervised or supervised learning to determine hash functions. Recently semi-supervised learning approach was introduced in hashing where pair wise constraints (must link and cannot-link) using labeled data are leveraged while unlabeled data are used for regularization to avoid over-fitting. In this paper we base our semi-supervised hashing on linear discriminant analysis, where hash functions are learned such that labeled data are used to maximize the separability between binary codes associated with different classes while unlabeled data are used for regularization as well as for balancing condition and pair wise decor relation of bits. The resulting method is referred to as semi-supervised discriminant hashing (SSDH). Numerical experiments on MNIST and CIFAR-10 datasets demonstrate that our method outperforms existing methods, especially in the case of short binary codes.","PeriodicalId":106216,"journal":{"name":"2011 IEEE 11th International Conference on Data Mining","volume":"10875 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132329111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 26

Sparse Domain Adaptation in Projection Spaces Based on Good Similarity Functions 基于良好相似函数的投影空间稀疏域自适应

2011 IEEE 11th International Conference on Data Mining Pub Date : 2011-12-11 DOI: 10.1109/ICDM.2011.136

Emilie Morvant, Amaury Habrard, S. Ayache

{"title":"Sparse Domain Adaptation in Projection Spaces Based on Good Similarity Functions","authors":"Emilie Morvant, Amaury Habrard, S. Ayache","doi":"10.1109/ICDM.2011.136","DOIUrl":"https://doi.org/10.1109/ICDM.2011.136","url":null,"abstract":"We address the problem of domain adaptation for binary classification which arises when the distributions generating the source learning data and target test data are somewhat different. We consider the challenging case where no target labeled data is available. From a theoretical standpoint, a classifier has better generalization guarantees when the two domain marginal distributions are close. We study a new direction based on a recent framework of Balcan et al. allowing to learn linear classifiers in an explicit projection space based on similarity functions that may be not symmetric and not positive semi-definite. We propose a general method for learning a good classifier on target data with generalization guarantees and we improve its efficiency thanks to an iterative procedure by reweighting the similarity function - compatible with Balcan et al. framework - to move closer the two distributions in a new projection space. Hyper parameters and reweighting quality are controlled by a reverse validation procedure. Our approach is based on a linear programming formulation and shows good adaptation performances with very sparse models. We evaluate it on a synthetic problem and on real image annotation task.","PeriodicalId":106216,"journal":{"name":"2011 IEEE 11th International Conference on Data Mining","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125257806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

ADANA: Active Name Disambiguation ADANA:主动名称消歧

2011 IEEE 11th International Conference on Data Mining Pub Date : 2011-12-11 DOI: 10.1109/ICDM.2011.19

Xuezhi Wang, Jie Tang, Hong Cheng, Philip S. Yu

{"title":"ADANA: Active Name Disambiguation","authors":"Xuezhi Wang, Jie Tang, Hong Cheng, Philip S. Yu","doi":"10.1109/ICDM.2011.19","DOIUrl":"https://doi.org/10.1109/ICDM.2011.19","url":null,"abstract":"Name ambiguity has long been viewed as a challenging problem in many applications, such as scientific literature management, people search, and social network analysis. When we search a person name in these systems, many documents (e.g., papers, web pages) containing that person's name may be returned. It is hard to determine which documents are about the person we care about. Although much research has been conducted, the problem remains largely unsolved, especially with the rapid growth of the people information available on the Web. In this paper, we try to study this problem from a new perspective and propose an ADANA method for disambiguating person names via active user interactions. In ADANA, we first introduce a pairwise factor graph (PFG) model for person name disambiguation. The model is flexible and can be easily extended by incorporating various features. Based on the PFG model, we propose an active name disambiguation algorithm, aiming to improve the disambiguation performance by maximizing the utility of the user's correction. Experimental results on three different genres of data sets show that with only a few user corrections, the error rate of name disambiguation can be reduced to 3.1%. A real system has been developed based on the proposed method and is available online.","PeriodicalId":106216,"journal":{"name":"2011 IEEE 11th International Conference on Data Mining","volume":"219 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114744972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 91

An Analysis of Performance Measures for Binary Classifiers 二值分类器性能指标分析

2011 IEEE 11th International Conference on Data Mining Pub Date : 2011-12-11 DOI: 10.1109/ICDM.2011.21

Charles Parker

引用次数: 53

Discovering Thematic Patterns in Videos via Cohesive Sub-graph Mining 基于内聚子图挖掘的视频主题模式研究

2011 IEEE 11th International Conference on Data Mining Pub Date : 2011-12-11 DOI: 10.1109/ICDM.2011.55

Gangqiang Zhao, Junsong Yuan

{"title":"Discovering Thematic Patterns in Videos via Cohesive Sub-graph Mining","authors":"Gangqiang Zhao, Junsong Yuan","doi":"10.1109/ICDM.2011.55","DOIUrl":"https://doi.org/10.1109/ICDM.2011.55","url":null,"abstract":"One category of videos usually contains the same thematic pattern, e.g., the spin action in skating videos. The discovery of the thematic pattern is essential to understand and summarize the video contents. This paper addresses two critical issues in mining thematic video patterns: (1) automatic discovery of thematic patterns without any training or supervision information, and (2) accurate localization of the occurrences of all thematic patterns in videos. The major contributions are two-fold. First, we formulate the thematic video pattern discovery as a cohesive sub-graph selection problem by finding a sub-set of visual words that are spatio-temporally collocated. Then spatio-temporal branch-and-bound search can locate all instances accurately. Second, a novel method is proposed to efficiently find the cohesive sub-graph of maximum overall mutual information scores. Our experimental results on challenging commercial and action videos show that our approach can discover different types of thematic patterns despite variations in scale, view-point, color and lighting conditions, or partial occlusions. Our approach is also robust to the videos with cluttered and dynamic backgrounds.","PeriodicalId":106216,"journal":{"name":"2011 IEEE 11th International Conference on Data Mining","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123616718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

Exploiting False Discoveries -- Statistical Validation of Patterns and Quality Measures in Subgroup Discovery 利用错误发现——子群发现中模式和质量度量的统计验证

2011 IEEE 11th International Conference on Data Mining Pub Date : 2011-12-11 DOI: 10.1109/ICDM.2011.65

W. Duivesteijn, A. Knobbe

引用次数: 55

Co-clustering for Binary and Categorical Data with Maximum Modularity 具有最大模块化的二值和分类数据的共聚类

2011 IEEE 11th International Conference on Data Mining Pub Date : 2011-12-11 DOI: 10.1109/ICDM.2011.37

Lazhar Labiod, M. Nadif

引用次数: 42

An Efficient Greedy Method for Unsupervised Feature Selection 一种有效的无监督特征选择贪心方法

2011 IEEE 11th International Conference on Data Mining Pub Date : 2011-12-11 DOI: 10.1109/ICDM.2011.22

Ahmed K. Farahat, A. Ghodsi, M. Kamel

{"title":"An Efficient Greedy Method for Unsupervised Feature Selection","authors":"Ahmed K. Farahat, A. Ghodsi, M. Kamel","doi":"10.1109/ICDM.2011.22","DOIUrl":"https://doi.org/10.1109/ICDM.2011.22","url":null,"abstract":"In data mining applications, data instances are typically described by a huge number of features. Most of these features are irrelevant or redundant, which negatively affects the efficiency and effectiveness of different learning algorithms. The selection of relevant features is a crucial task which can be used to allow a better understanding of data or improve the performance of other learning tasks. Although the selection of relevant features has been extensively studied in supervised learning, feature selection with the absence of class labels is still a challenging task. This paper proposes a novel method for unsupervised feature selection, which efficiently selects features in a greedy manner. The paper first defines an effective criterion for unsupervised feature selection which measures the reconstruction error of the data matrix based on the selected subset of features. The paper then presents a novel algorithm for greedily minimizing the reconstruction error based on the features selected so far. The greedy algorithm is based on an efficient recursive formula for calculating the reconstruction error. Experiments on real data sets demonstrate the effectiveness of the proposed algorithm in comparison to the state-of-the-art methods for unsupervised feature selection.","PeriodicalId":106216,"journal":{"name":"2011 IEEE 11th International Conference on Data Mining","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116980630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 87