Sixth International Conference on Data Mining (ICDM'06)最新文献

筛选
英文 中文
Boosting the Feature Space: Text Classification for Unstructured Data on the Web 增强特征空间:Web上非结构化数据的文本分类
Sixth International Conference on Data Mining (ICDM'06) Pub Date : 2006-12-18 DOI: 10.1109/ICDM.2006.31
Yang Song, Ding Zhou, Jian Huang, Isaac G. Councill, H. Zha, C. Lee Giles
{"title":"Boosting the Feature Space: Text Classification for Unstructured Data on the Web","authors":"Yang Song, Ding Zhou, Jian Huang, Isaac G. Councill, H. Zha, C. Lee Giles","doi":"10.1109/ICDM.2006.31","DOIUrl":"https://doi.org/10.1109/ICDM.2006.31","url":null,"abstract":"The issue of seeking efficient and effective methods for classifying unstructured text in large document corpora has received much attention in recent years. Traditional document representation like bag-of-words encodes documents as feature vectors, which usually leads to sparse feature spaces with large dimensionality, thus making it hard to achieve high classification accuracies. This paper addresses the problem of classifying unstructured documents on the Web. A classification approach is proposed that utilizes traditional feature reduction techniques along with a collaborative filtering method for augmenting document feature spaces. The method produces feature spaces with an order of magnitude less features compared with a baseline bag-of-words feature selection method. Experiments on both real-world data and benchmark corpus indicate that our approach improves classification accuracy over the traditional methods for both support vector machines and AdaBoost classifiers.","PeriodicalId":356443,"journal":{"name":"Sixth International Conference on Data Mining (ICDM'06)","volume":"46 Pt 3-4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127523883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Stability Region Based Expectation Maximization for Model-based Clustering 基于稳定域的模型聚类期望最大化
Sixth International Conference on Data Mining (ICDM'06) Pub Date : 2006-12-18 DOI: 10.1109/ICDM.2006.152
C. Reddy, H. Chiang, B. Rajaratnam
{"title":"Stability Region Based Expectation Maximization for Model-based Clustering","authors":"C. Reddy, H. Chiang, B. Rajaratnam","doi":"10.1109/ICDM.2006.152","DOIUrl":"https://doi.org/10.1109/ICDM.2006.152","url":null,"abstract":"In spite of the initialization problem, the expectation-maximization (EM) algorithm is widely used for estimating the parameters in several data mining related tasks. Most popular model-based clustering techniques might yield poor clusters if the parameters are not initialized properly. To reduce the sensitivity of initial points, a novel algorithm for learning mixture models from multivariate data is introduced in this paper. The proposed algorithm takes advantage of TRUST-TECH (TRansformation Under STability- reTaining Equilibra CHaracterization) to compute neighborhood local maxima on likelihood surface using stability regions. Basically, our method coalesces the advantages of the traditional EM with that of the dynamic and geometric characteristics of the stability regions of the corresponding nonlinear dynamical system of the log-likelihood function. Two phases namely, the EM phase and the stability region phase, are repeated alternatively in the parameter space to achieve improvements in the maximum likelihood. Though applied to Gaussian mixtures in this paper, our technique can be easily generalized to any other parametric finite mixture model. The algorithm has been tested on both synthetic and real datasets and the improvements in the performance compared to other approaches are demonstrated. The robustness with respect to initialization is also illustrated experimentally.","PeriodicalId":356443,"journal":{"name":"Sixth International Conference on Data Mining (ICDM'06)","volume":"113 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115766098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Fast Relevance Discovery in Time Series 时间序列中的快速相关性发现
Sixth International Conference on Data Mining (ICDM'06) Pub Date : 2006-12-18 DOI: 10.1109/ICDM.2006.71
Chang-Shing Perng, Haixun Wang, Sheng Ma
{"title":"Fast Relevance Discovery in Time Series","authors":"Chang-Shing Perng, Haixun Wang, Sheng Ma","doi":"10.1109/ICDM.2006.71","DOIUrl":"https://doi.org/10.1109/ICDM.2006.71","url":null,"abstract":"In this paper, we propose to model time series from a new angle: state transition points. When fluctuation of values in a time series crosses a certain point, it may trigger state transition in the system, which may lead to abrupt changes in many other time series. The concept of state transition points is essential in understanding the behavior of the time series and the behavior of the system. The new measure is robust and is capable of discovering correlations that Pearson's coefficient cannot reveal. We propose efficient algorithms to identify state transition points and to compute correlation between two time series. We also introduce some triangular inequalities to efficiently find highly correlated time series among many time series.","PeriodicalId":356443,"journal":{"name":"Sixth International Conference on Data Mining (ICDM'06)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124356661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Resource Management for Networked Classifiers in Distributed Stream Mining Systems 分布式流挖掘系统中网络分类器的资源管理
Sixth International Conference on Data Mining (ICDM'06) Pub Date : 2006-12-18 DOI: 10.1109/ICDM.2006.136
D. Turaga, O. Verscheure, U. Chaudhari, Lisa Amini
{"title":"Resource Management for Networked Classifiers in Distributed Stream Mining Systems","authors":"D. Turaga, O. Verscheure, U. Chaudhari, Lisa Amini","doi":"10.1109/ICDM.2006.136","DOIUrl":"https://doi.org/10.1109/ICDM.2006.136","url":null,"abstract":"Networks of classifiers are capturing the attention of system and algorithmic researchers because they offer improved accuracy over single model classifiers, can be distributed over a network of servers for improved scalability, and can be adapted to available system resources. This work provides a principled approach for the optimized allocation of system resources across a networked chain of classifiers. We begin with an illustrative example of how complex classification tasks can be decomposed into a network of binary classifiers. We formally define a global performance metric by recursively collapsing the chain of classifiers into one combined classifier. The performance metric trades off the end-to-end probabilities of detection and false alarm, both of which depend on the resources allocated to each individual classifier. We formulate the optimization problem and present optimal resource allocation results for both simulated and state-of-the-art classifier chains operating on telephony data.","PeriodicalId":356443,"journal":{"name":"Sixth International Conference on Data Mining (ICDM'06)","volume":"422 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117348513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Anytime Classification Using the Nearest Neighbor Algorithm with Applications to Stream Mining 基于最近邻算法的任意时间分类及其在流挖掘中的应用
Sixth International Conference on Data Mining (ICDM'06) Pub Date : 2006-12-18 DOI: 10.1109/ICDM.2006.21
Ken Ueno, X. Xi, Eamonn J. Keogh, Dah-Jye Lee
{"title":"Anytime Classification Using the Nearest Neighbor Algorithm with Applications to Stream Mining","authors":"Ken Ueno, X. Xi, Eamonn J. Keogh, Dah-Jye Lee","doi":"10.1109/ICDM.2006.21","DOIUrl":"https://doi.org/10.1109/ICDM.2006.21","url":null,"abstract":"For many real world problems we must perform classification under widely varying amounts of computational resources. For example, if asked to classify an instance taken from a bursty stream, we may have from milliseconds to minutes to return a class prediction. For such problems an anytime algorithm may be especially useful. In this work we show how we can convert the ubiquitous nearest neighbor classifier into an anytime algorithm that can produce an instant classification, or if given the luxury of additional time, can utilize the extra time to increase classification accuracy. We demonstrate the utility of our approach with a comprehensive set of experiments on data from diverse domains.","PeriodicalId":356443,"journal":{"name":"Sixth International Conference on Data Mining (ICDM'06)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129897728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 115
TOP-COP: Mining TOP-K Strongly Correlated Pairs in Large Databases TOP-COP:大型数据库中TOP-K强相关对的挖掘
Sixth International Conference on Data Mining (ICDM'06) Pub Date : 2006-12-18 DOI: 10.1109/ICDM.2006.161
Hui Xiong, Mark Brodie, Sheng Ma
{"title":"TOP-COP: Mining TOP-K Strongly Correlated Pairs in Large Databases","authors":"Hui Xiong, Mark Brodie, Sheng Ma","doi":"10.1109/ICDM.2006.161","DOIUrl":"https://doi.org/10.1109/ICDM.2006.161","url":null,"abstract":"Recently, there has been considerable interest in computing strongly correlated pairs in large databases. Most previous studies require the specification of a minimum correlation threshold to perform the computation. However, it may be difficult for users to provide an appropriate threshold in practice, since different data sets typically have different characteristics. To this end, we propose an alternative task: mining the top-k strongly correlated pairs. In this paper, we identify a 2-D monotone property of an upper bound of Pearson's correlation coefficient and develop an efficient algorithm, called TOP-COP to exploit this property to effectively prune many pairs even without computing their correlation coefficients. Our experimental results show that the TOP-COP algorithm can be orders of magnitude faster than brute-force alternatives for mining the top-k strongly correlated pairs.","PeriodicalId":356443,"journal":{"name":"Sixth International Conference on Data Mining (ICDM'06)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130989284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
Using an Ensemble of One-Class SVM Classifiers to Harden Payload-based Anomaly Detection Systems 基于单类支持向量机分类器集成的有效载荷异常检测系统
Sixth International Conference on Data Mining (ICDM'06) Pub Date : 2006-12-18 DOI: 10.1109/ICDM.2006.165
R. Perdisci, G. Gu, Wenke Lee
{"title":"Using an Ensemble of One-Class SVM Classifiers to Harden Payload-based Anomaly Detection Systems","authors":"R. Perdisci, G. Gu, Wenke Lee","doi":"10.1109/ICDM.2006.165","DOIUrl":"https://doi.org/10.1109/ICDM.2006.165","url":null,"abstract":"Unsupervised or unlabeled learning approaches for network anomaly detection have been recently proposed. In particular, recent work on unlabeled anomaly detection focused on high speed classification based on simple payload statistics. For example, PAYL, an anomaly IDS, measures the occurrence frequency in the payload of n-grams. A simple model of normal traffic is then constructed according to this description of the packets' content. It has been demonstrated that anomaly detectors based on payload statistics can be \"evaded\" by mimicry attacks using byte substitution and padding techniques. In this paper we propose a new approach to construct high speed payload-based anomaly IDS intended to be accurate and hard to evade. We propose a new technique to extract the features from the payload. We use a feature clustering algorithm originally proposed for text classification problems to reduce the dimensionality of the feature space. Accuracy and hardness of evasion are obtained by constructing our anomaly-based IDS using an ensemble of one-class SVM classifiers that work on different feature spaces.","PeriodicalId":356443,"journal":{"name":"Sixth International Conference on Data Mining (ICDM'06)","volume":"168 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131334760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 243
GraphRank: Statistical Modeling and Mining of Significant Subgraphs in the Feature Space GraphRank:特征空间中显著子图的统计建模和挖掘
Sixth International Conference on Data Mining (ICDM'06) Pub Date : 2006-12-18 DOI: 10.1109/ICDM.2006.79
Huahai He, Ambuj K. Singh
{"title":"GraphRank: Statistical Modeling and Mining of Significant Subgraphs in the Feature Space","authors":"Huahai He, Ambuj K. Singh","doi":"10.1109/ICDM.2006.79","DOIUrl":"https://doi.org/10.1109/ICDM.2006.79","url":null,"abstract":"We propose a technique for evaluating the statistical significance of frequent subgraphs in a database. A graph is represented by a feature vector that is a histogram over a set of basis elements. The set of basis elements is chosen based on domain knowledge and consists generally of vertices, edges, or small graphs. A given subgraph is transformed to a feature vector and the significance of the subgraph is computed by considering the significance of occurrence of the corresponding vector. The probability of occurrence of the vector in a random vector is computed based on the prior probability of the basis elements. This is then used to obtain a probability distribution on the support of the vector in a database of random vectors. The statistical significance of the vector/subgraph is then defined as the p-value of its observed support. We develop efficient methods for computing p-values and lower bounds. A simplified model is further proposed to improve the efficiency. We also address the problem of feature vector mining, a generalization of item- set mining where counts are associated with items and the goal is to find significant sub-vectors. We present an algorithm that explores closed frequent sub-vectors to find significant ones. Experimental results show that the proposed techniques are effective, efficient, and useful for ranking frequent subgraphs by their statistical significance.","PeriodicalId":356443,"journal":{"name":"Sixth International Conference on Data Mining (ICDM'06)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121542322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 58
Temporal Data Mining in Dynamic Feature Spaces 动态特征空间中的时态数据挖掘
Sixth International Conference on Data Mining (ICDM'06) Pub Date : 2006-12-18 DOI: 10.1109/ICDM.2006.157
B. Wenerstrom, C. Giraud-Carrier
{"title":"Temporal Data Mining in Dynamic Feature Spaces","authors":"B. Wenerstrom, C. Giraud-Carrier","doi":"10.1109/ICDM.2006.157","DOIUrl":"https://doi.org/10.1109/ICDM.2006.157","url":null,"abstract":"Many interesting real-world applications for temporal data mining are hindered by concept drift. One particular form of concept drift is characterized by changes to the underlying feature space. Seemingly little has been done in this area. This paper presents FAE, an incremental ensemble approach to mining data subject to such concept drift. Empirical results on large data streams demonstrate promise.","PeriodicalId":356443,"journal":{"name":"Sixth International Conference on Data Mining (ICDM'06)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126263629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 43
Speedup Clustering with Hierarchical Ranking 用层次排序加速聚类
Sixth International Conference on Data Mining (ICDM'06) Pub Date : 2006-12-18 DOI: 10.1109/ICDM.2006.151
Jianjun Zhou, J. Sander
{"title":"Speedup Clustering with Hierarchical Ranking","authors":"Jianjun Zhou, J. Sander","doi":"10.1109/ICDM.2006.151","DOIUrl":"https://doi.org/10.1109/ICDM.2006.151","url":null,"abstract":"Many clustering algorithms in particular hierarchical clustering algorithms do not scale-up well for large data-sets especially when using an expensive distance function. In this paper, we propose a novel approach to perform approximate clustering with high accuracy. We introduce the concept of a pairwise hierarchical ranking to efficiently determine close neighbors for every data object. Empirical results on synthetic and real-life data show a speedup of up to two orders of magnitude over OPTICS while maintaining a high accuracy and up to one order of magnitude over the previously proposed DATA BUBBLES method, which also tries to speedup OPTICS by trading accuracy for speed.","PeriodicalId":356443,"journal":{"name":"Sixth International Conference on Data Mining (ICDM'06)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115871650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信