2008 Eighth IEEE International Conference on Data Mining最新文献_第10页

On Locally Linear Classification by Pairwise Coupling 基于成对耦合的局部线性分类

2008 Eighth IEEE International Conference on Data Mining Pub Date : 2008-12-15 DOI: 10.1109/ICDM.2008.137

F. Chen, Chang-Tien Lu, Arnold P. Boedihardjo

引用次数: 11

Graph-Based Rare Category Detection 基于图的稀有类别检测

2008 Eighth IEEE International Conference on Data Mining Pub Date : 2008-12-15 DOI: 10.1109/ICDM.2008.122

Jingrui He, Yan Liu, Richard D. Lawrence

{"title":"Graph-Based Rare Category Detection","authors":"Jingrui He, Yan Liu, Richard D. Lawrence","doi":"10.1109/ICDM.2008.122","DOIUrl":"https://doi.org/10.1109/ICDM.2008.122","url":null,"abstract":"Rare category detection is the task of identifying examples from rare classes in an unlabeled data set. It is an open challenge in machine learning and plays key roles in real applications such as financial fraud detection, network intrusion detection, astronomy, spam image detection, etc. In this paper, we develop a new graph-based method for rare category detection named GRADE. It makes use of the global similarity matrix motivated by the manifold ranking algorithm, which results in more compact clusters for the minority classes; by selecting examples from the regions where probability density changes the most, it relaxes the assumption that the majority classes and the minority classes are separable. Furthermore, when detailed information about the data set is not available, we develop a modified version of GRADE named GRADE-LI, which only needs an upper bound on the proportion of each minority class as input. Besides working with data with structured features, both GRADE and GRADE-LI can also work with graph data, which can not be handled by existing rare category detection methods. Experimental results on both synthetic and real data sets demonstrate the effectiveness of the GRADE and GRADE-LI algorithms.","PeriodicalId":252958,"journal":{"name":"2008 Eighth IEEE International Conference on Data Mining","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121135504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 48

Classifying High-Dimensional Text and Web Data Using Very Short Patterns 使用非常短的模式分类高维文本和Web数据

2008 Eighth IEEE International Conference on Data Mining Pub Date : 2008-12-15 DOI: 10.1109/ICDM.2008.139

Hassan H. Malik, J. Kender

引用次数: 14

Organic Pie Charts 有机饼状图

2008 Eighth IEEE International Conference on Data Mining Pub Date : 2008-12-15 DOI: 10.1109/ICDM.2008.64

F. Mörchen

引用次数: 0

Paired Learners for Concept Drift 概念漂移的配对学习者

2008 Eighth IEEE International Conference on Data Mining Pub Date : 2008-12-15 DOI: 10.1109/ICDM.2008.119

Stephen H. Bach, M. Maloof

{"title":"Paired Learners for Concept Drift","authors":"Stephen H. Bach, M. Maloof","doi":"10.1109/ICDM.2008.119","DOIUrl":"https://doi.org/10.1109/ICDM.2008.119","url":null,"abstract":"To cope with concept drift, we paired a stable online learner with a reactive one. A stable learner predicts based on all of its experience, whereas are active learner predicts based on its experience over a short, recent window of time. The method of paired learning uses differences in accuracy between the two learners over this window to determine when to replace the current stable learner, since the stable learner performs worse than does there active learner when the target concept changes. While the method uses the reactive learner as an indicator of drift, it uses the stable learner to predict, since the stable learner performs better than does the reactive learner when acquiring target concept. Experimental results support these assertions. We evaluated the method by making direct comparisons to dynamic weighted majority, accuracy weighted ensemble, and streaming ensemble algorithm (SEA) using two synthetic problems, the Stagger concepts and the SEA concepts, and three real-world data sets: meeting scheduling, electricity prediction, and malware detection. Results suggest that, on these problems, paired learners outperformed or performed comparably to methods more costly in time and space.","PeriodicalId":252958,"journal":{"name":"2008 Eighth IEEE International Conference on Data Mining","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115544160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 141

Space Efficient String Mining under Frequency Constraints 频率约束下的空间高效字符串挖掘

2008 Eighth IEEE International Conference on Data Mining Pub Date : 2008-12-15 DOI: 10.1109/ICDM.2008.32

J. Fischer, V. Mäkinen, Niko Välimäki

{"title":"Space Efficient String Mining under Frequency Constraints","authors":"J. Fischer, V. Mäkinen, Niko Välimäki","doi":"10.1109/ICDM.2008.32","DOIUrl":"https://doi.org/10.1109/ICDM.2008.32","url":null,"abstract":"Let D1 and D2 be two databases (i.e. multisets) of d strings, over an alphabet Sigma, with overall length n. We study the problem of mining discriminative patterns between D1 and D2 - e.g., patterns that are frequent in one database but not in the other, emerging patterns, or patterns satisfying other frequency-related constraints. Using the algorithmic framework by Hui (CPM 1992), one can solve several variants of this problem in the optimal linear time with the aid of suffix trees or suffix arrays. This stands in high contrast to other pattern domains such as item-sets or subgraphs, where super-linear lower bounds are known. However, the space requirement of existing solutions is O(n log n) bits, which is not optimal for |Sigma| Lt n (in particular for constant |Sigma|), as the databases themselves occupy only n log |Sigma| bits. Because in many real-life applications space is a more critical resource than time, the aim of this article is to reduce the space, at the cost of an increased running time. In particular, we give a solution for the above problems that uses O(n log |Sigma| + d log n) bits, while the time requirement is increased from the optimal linear time to O(n log n). Our new method is tested extensively on a biologically relevant datasets and shown to be usable even on a genome-scale data.","PeriodicalId":252958,"journal":{"name":"2008 Eighth IEEE International Conference on Data Mining","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128274086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 29

Nonnegative Matrix Factorization for Combinatorial Optimization: Spectral Clustering, Graph Matching, and Clique Finding 组合优化的非负矩阵分解:谱聚类，图匹配和团查找

2008 Eighth IEEE International Conference on Data Mining Pub Date : 2008-12-15 DOI: 10.1109/ICDM.2008.130

C. Ding, Tao Li, Michael I. Jordan

引用次数: 139

Direct Zero-Norm Optimization for Feature Selection 特征选择的直接零范数优化

2008 Eighth IEEE International Conference on Data Mining Pub Date : 2008-12-15 DOI: 10.1109/ICDM.2008.60

Kaizhu Huang, Irwin King, Michael R. Lyu

引用次数: 23

TOFA: Trace Oriented Feature Analysis in Text Categorization 文本分类中面向跟踪的特征分析

2008 Eighth IEEE International Conference on Data Mining Pub Date : 2008-12-15 DOI: 10.1109/ICDM.2008.67

Jun Yan, Ning Liu, Qiang Yang, Weiguo Fan, Zheng Chen

引用次数: 2

Text Mining in Radiology Reports 放射学报告中的文本挖掘

2008 Eighth IEEE International Conference on Data Mining Pub Date : 2008-12-15 DOI: 10.1109/ICDM.2008.150

Tianxia Gong, Chew Lim Tan, T. Leong, C. Lee, B. Pang, C. C. Tchoyoson Lim, Qi Tian, Suisheng Tang, Zhuo Zhang

引用次数: 35