Sixth International Conference on Data Mining (ICDM'06)最新文献_第10页

An Information Theoretic Approach to Detection of Minority Subsets in Database 数据库中少数子集检测的信息论方法

Sixth International Conference on Data Mining (ICDM'06) Pub Date : 2006-12-18 DOI: 10.1109/ICDM.2006.19

S. Ando, Einoshin Suzuki

{"title":"An Information Theoretic Approach to Detection of Minority Subsets in Database","authors":"S. Ando, Einoshin Suzuki","doi":"10.1109/ICDM.2006.19","DOIUrl":"https://doi.org/10.1109/ICDM.2006.19","url":null,"abstract":"Detection of rare and exceptional occurrences in large- scale databases have become an important practice in the field of knowledge discovery and information retrieval. Many databases include large amount of noise or irrelevant data, whose distribution often overlaps with the subsets of exceptional data containing useful knowledge. This paper addresses the problem of finding a small subset of \"minority\" data whose distribution overlaps with, but are exceptional to or inconsistent with that of the majority of the database. In such a case, conventional distance-based or density-based approaches in Outlier Detection are ineffective due to their dependence on the structure of the majority or the prerequisite of critical parameters. We formalize the task as an estimation of a model of the minority subset which provides a simple description of the subset and yet maintains divergence from that of the majority. This estimation is formalized as a minimization problem using an information theoretic framework of Rate Distortion theory. We further introduce conditions of the majority to derive an objective function which factorizes the property of the minority and dependence to the structure of the majority. The proposed method shows improvements from conventional approaches in artificial data and a promising result in document retrieval problem.","PeriodicalId":356443,"journal":{"name":"Sixth International Conference on Data Mining (ICDM'06)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121808601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

A Simple Yet Effective Data Clustering Algorithm 一种简单而有效的数据聚类算法

Sixth International Conference on Data Mining (ICDM'06) Pub Date : 2006-12-18 DOI: 10.1109/ICDM.2006.9

S. Vadapalli, Satyanarayana R. Valluri, K. Karlapalem

引用次数: 51

COSMIC: Conceptually Specified Multi-Instance Clusters COSMIC:概念上指定的多实例集群

Sixth International Conference on Data Mining (ICDM'06) Pub Date : 2006-12-18 DOI: 10.1109/ICDM.2006.46

H. Kriegel, A. Pryakhin, Matthias Schubert, A. Zimek

引用次数: 5

Bregman Bubble Clustering: A Robust, Scalable Framework for Locating Multiple, Dense Regions in Data Bregman气泡聚类:一个鲁棒的、可扩展的框架，用于定位数据中的多个密集区域

Sixth International Conference on Data Mining (ICDM'06) Pub Date : 2006-12-18 DOI: 10.1109/ICDM.2006.32

Gunjan Gupta, Joydeep Ghosh

引用次数: 18

Adaptive Blocking: Learning to Scale Up Record Linkage 自适应阻塞:学习扩展记录链接

Sixth International Conference on Data Mining (ICDM'06) Pub Date : 2006-12-18 DOI: 10.1109/ICDM.2006.13

M. Bilenko, B. Kamath, R. Mooney

{"title":"Adaptive Blocking: Learning to Scale Up Record Linkage","authors":"M. Bilenko, B. Kamath, R. Mooney","doi":"10.1109/ICDM.2006.13","DOIUrl":"https://doi.org/10.1109/ICDM.2006.13","url":null,"abstract":"Many data mining tasks require computing similarity between pairs of objects. Pairwise similarity computations are particularly important in record linkage systems, as well as in clustering and schema mapping algorithms. Because the number of object pairs grows quadratically with the size of the dataset, computing similarity between all pairs is impractical and becomes prohibitive for large datasets and complex similarity functions. Blocking methods alleviate this problem by efficiently selecting approximately similar object pairs for subsequent distance computations, leaving out the remaining pairs as dissimilar. Previously proposed blocking methods require manually constructing an index- based similarity function or selecting a set of predicates, followed by hand-tuning of parameters. In this paper, we introduce an adaptive framework for automatically learning blocking functions that are efficient and accurate. We describe two predicate-based formulations of learnable blocking functions and provide learning algorithms for training them. The effectiveness of the proposed techniques is demonstrated on real and simulated datasets, on which they prove to be more accurate than non-adaptive blocking methods.","PeriodicalId":356443,"journal":{"name":"Sixth International Conference on Data Mining (ICDM'06)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133049729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 273

Recommendation on Item Graphs 项目图推荐

Sixth International Conference on Data Mining (ICDM'06) Pub Date : 2006-12-18 DOI: 10.1109/ICDM.2006.133

Fei Wang, Shengchao Ma, Liuzhong Yang, Ta-Hsin Li

引用次数: 44

A Framework for Regional Association Rule Mining in Spatial Datasets 空间数据集区域关联规则挖掘框架

Sixth International Conference on Data Mining (ICDM'06) Pub Date : 2006-12-18 DOI: 10.1109/ICDM.2006.5

W. Ding, C. Eick, Jing Wang, Xiaojing Yuan

引用次数: 45

An Efficient Reference-Based Approach to Outlier Detection in Large Datasets 基于参考的大型数据集离群点检测方法

Sixth International Conference on Data Mining (ICDM'06) Pub Date : 2006-12-18 DOI: 10.1109/ICDM.2006.17

Yaling Pei, Osmar R Zaiane, Yong Gao

引用次数: 57

A Novel Method for Detecting Outlying Subspaces in High-dimensional Databases Using Genetic Algorithm 一种基于遗传算法的高维数据库外围子空间检测新方法

Sixth International Conference on Data Mining (ICDM'06) Pub Date : 2006-12-18 DOI: 10.1109/ICDM.2006.6

Ji Zhang, Q. Gao, Hai H. Wang

{"title":"A Novel Method for Detecting Outlying Subspaces in High-dimensional Databases Using Genetic Algorithm","authors":"Ji Zhang, Q. Gao, Hai H. Wang","doi":"10.1109/ICDM.2006.6","DOIUrl":"https://doi.org/10.1109/ICDM.2006.6","url":null,"abstract":"Detecting outlying subspaces is a relatively new research problem in outlier-ness analysis for high-dimensional data. An outlying subspace for a given data point p is the sub- space in which p is an outlier. Outlying subspace detection can facilitate a better characterization process for the detected outliers. It can also enable outlier mining for high- dimensional data to be performed more accurately and efficiently. In this paper, we proposed a new method using genetic algorithm paradigm for searching outlying subspaces efficiently. We developed a technique for efficiently computing the lower and upper bounds of the distance between a given point and its kth nearest neighbor in each possible subspace. These bounds are used to speed up the fitness evaluation of the designed genetic algorithm for outlying subspace detection. We also proposed a random sampling technique to further reduce the computation of the genetic algorithm. The optimal number of sampling data is specified to ensure the accuracy of the result. We show that the proposed method is efficient and effective in handling outlying subspace detection problem by a set of experiments conducted on both synthetic and real-life datasets.","PeriodicalId":356443,"journal":{"name":"Sixth International Conference on Data Mining (ICDM'06)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126797445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 29

Dimension Reduction for Supervised Ordering 监督排序的降维

Sixth International Conference on Data Mining (ICDM'06) Pub Date : 2006-12-18 DOI: 10.1109/ICDM.2006.53

Toshihiro Kamishima, S. Akaho

引用次数: 11