Sixth International Conference on Data Mining (ICDM'06)最新文献

筛选
英文 中文
Probabilistic Enhanced Mapping with the Generative Tabular Model 基于生成表格模型的概率增强映射
Sixth International Conference on Data Mining (ICDM'06) Pub Date : 2006-12-18 DOI: 10.1109/ICDM.2006.128
R. Priam, M. Nadif
{"title":"Probabilistic Enhanced Mapping with the Generative Tabular Model","authors":"R. Priam, M. Nadif","doi":"10.1109/ICDM.2006.128","DOIUrl":"https://doi.org/10.1109/ICDM.2006.128","url":null,"abstract":"Visualization of the massive datasets needs new methods which are able to quickly and easily reveal their contents. The projection of the data cloud is an interesting paradigm in spite of its difficulty to be explored when data plots are too numerous. So we study a new way to show a bidimensional projection from a multidimensional data cloud: our generative model constructs a tabular view of the projected cloud. We are able to show the high densities areas by their non equidistributed discretization. This approach is an alternative to the self-organizing map when a projection does already exist. The resulting pixel views of a dataset are illustrated by projecting a data sample of real images: it becomes possible to observe how are laid out the class labels or the frequencies of a group of modalities without being lost because of a zoom enlarging change for instance. The conclusion gives perspectives to this original promising point of view to get a readable projection for a statistical data analysis of large data samples.","PeriodicalId":356443,"journal":{"name":"Sixth International Conference on Data Mining (ICDM'06)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125242000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving Nearest Neighbor Classifier Using Tabu Search and Ensemble Distance Metrics 利用禁忌搜索和集合距离度量改进最近邻分类器
Sixth International Conference on Data Mining (ICDM'06) Pub Date : 2006-12-18 DOI: 10.1109/ICDM.2006.86
M. Tahir, Jim E. Smith
{"title":"Improving Nearest Neighbor Classifier Using Tabu Search and Ensemble Distance Metrics","authors":"M. Tahir, Jim E. Smith","doi":"10.1109/ICDM.2006.86","DOIUrl":"https://doi.org/10.1109/ICDM.2006.86","url":null,"abstract":"The nearest-neighbor (NN) classifier has long been used in pattern recognition, exploratory data analysis, and data mining problems. A vital consideration in obtaining good results with this technique is the choice of distance function, and correspondingly which features to consider when computing distances between samples. In this paper, a new ensemble technique is proposed to improve the performance of NN classifier. The proposed approach combines multiple NN classifiers, where each classifier uses a different distance function and potentially a different set of features (feature vector). These feature vectors are determined for each distance metric using Simple Voting Scheme incorporated in Tabu Search (TS). The proposed ensemble classifier with different distance metrics and different feature vectors (TS-DF/NN) is evaluated using various benchmark data sets from UCI Machine Learning Repository. Results have indicated a significant increase in the performance when compared with various well-known classifiers. Furthermore, the proposed ensemble method is also compared with ensemble classifier using different distance metrics but with same feature vector (with or without Feature Selection (FS)).","PeriodicalId":356443,"journal":{"name":"Sixth International Conference on Data Mining (ICDM'06)","volume":"27 18","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114017535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Incremental Mining of Frequent Query Patterns from XML Queries for Caching 基于缓存的XML查询频繁查询模式的增量挖掘
Sixth International Conference on Data Mining (ICDM'06) Pub Date : 2006-12-18 DOI: 10.1109/ICDM.2006.88
Guoliang Li, Jianhua Feng, Jianyong Wang, Yong Zhang, Lizhu Zhou
{"title":"Incremental Mining of Frequent Query Patterns from XML Queries for Caching","authors":"Guoliang Li, Jianhua Feng, Jianyong Wang, Yong Zhang, Lizhu Zhou","doi":"10.1109/ICDM.2006.88","DOIUrl":"https://doi.org/10.1109/ICDM.2006.88","url":null,"abstract":"Existing studies for mining frequent XML query patterns mainly introduce a straightforward candidate generate-and-test strategy and compute frequencies of candidate query patterns from scratch periodically by checking the entire transaction database, which consists of XML query patterns transformed from user queries. However, it is nontrivial to maintain such discovered frequent patterns in real XML databases because there may incur frequent updates that may not only invalidate some existing frequent query patterns but also generate some new frequent ones. Accordingly, existing proposals are inefficient for the evolution of the transaction database. To address these problems, this paper presents an efficient algorithm IPS-FXQPMiner for mining frequent XML query patterns without candidate maintenance and costly tree-containment checking. We transform XML queries into sequences through a one- to-one mapping and then mine the frequent sequences to generate frequent XML query patterns. More importantly, based on IPS-FXQPMiner, an efficient incremental algorithm, Incre-FXQPMiner is proposed to incrementally mine frequent XML query patterns, which can minimize the I/O and computation requirements for handling incremental updates. Our experimental study on various real-life datasets demonstrates the efficiency and scalability of our algorithms over previous known alternatives.","PeriodicalId":356443,"journal":{"name":"Sixth International Conference on Data Mining (ICDM'06)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128139391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
COALA: A Novel Approach for the Extraction of an Alternate Clustering of High Quality and High Dissimilarity COALA:一种提取高质量和高不相似度交替聚类的新方法
Sixth International Conference on Data Mining (ICDM'06) Pub Date : 2006-12-18 DOI: 10.1109/ICDM.2006.37
Eric Bae, J. Bailey
{"title":"COALA: A Novel Approach for the Extraction of an Alternate Clustering of High Quality and High Dissimilarity","authors":"Eric Bae, J. Bailey","doi":"10.1109/ICDM.2006.37","DOIUrl":"https://doi.org/10.1109/ICDM.2006.37","url":null,"abstract":"Cluster analysis has long been a fundamental task in data mining and machine learning. However, traditional clustering methods concentrate on producing a single solution, even though multiple alternative clusterings may exist. It is thus difficult for the user to validate whether the given solution is in fact appropriate, particularly for large and complex datasets. In this paper we explore the critical requirements for systematically finding a new clustering, given that an already known clustering is available and we also propose a novel algorithm, COALA, to discover this new clustering. Our approach is driven by two important factors; dissimilarity and quality. These are especially important for finding a new clustering which is highly informative about the underlying structure of data, but is at the same time distinctively different from the provided clustering. We undertake an experimental analysis and show that our method is able to outperform existing techniques, for both synthetic and real datasets.","PeriodicalId":356443,"journal":{"name":"Sixth International Conference on Data Mining (ICDM'06)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133301425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 140
Applying Data Mining to Pseudo-Relevance Feedback for High Performance Text Retrieval 将数据挖掘应用于伪相关反馈的高性能文本检索
Sixth International Conference on Data Mining (ICDM'06) Pub Date : 2006-12-18 DOI: 10.1109/ICDM.2006.22
Xiangji Huang, Y. Huang, M. Wen, Aijun An, Y. Liu, Josiah Poon
{"title":"Applying Data Mining to Pseudo-Relevance Feedback for High Performance Text Retrieval","authors":"Xiangji Huang, Y. Huang, M. Wen, Aijun An, Y. Liu, Josiah Poon","doi":"10.1109/ICDM.2006.22","DOIUrl":"https://doi.org/10.1109/ICDM.2006.22","url":null,"abstract":"In this paper, we investigate the use of data mining, in particular the text classification and co-training techniques, to identify more relevant passages based on a small set of labeled passages obtained from the blind feedback of a retrieval system. The data mining results are used to expand query terms and to re-estimate some of the parameters used in a probabilistic weighting function. We evaluate the data mining based feedback method on the TREC HARD data set. The results show that data mining can be successfully applied to improve the text retrieval performance. We report our experimental findings in detail.","PeriodicalId":356443,"journal":{"name":"Sixth International Conference on Data Mining (ICDM'06)","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133963687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 47
Solution Path for Semi-Supervised Classification with Manifold Regularization 具有流形正则化的半监督分类解路径
Sixth International Conference on Data Mining (ICDM'06) Pub Date : 2006-12-18 DOI: 10.1109/ICDM.2006.150
G. Wang, Tao Chen, D. Yeung, F. Lochovsky
{"title":"Solution Path for Semi-Supervised Classification with Manifold Regularization","authors":"G. Wang, Tao Chen, D. Yeung, F. Lochovsky","doi":"10.1109/ICDM.2006.150","DOIUrl":"https://doi.org/10.1109/ICDM.2006.150","url":null,"abstract":"With very low extra computational cost, the entire solution path can be computed for various learning algorithms like support vector classification (SVC) and support vector regression (SVR). In this paper, we extend this promising approach to semi-supervised learning algorithms. In particular, we consider finding the solution path for the Laplacian support vector machine (LapSVM) which is a semi-supervised classification model based on manifold regularization. One advantage of the this algorithm is that the coefficient path is piecewise linear with respect to the regularization parameter, hence its computational complexity is quadratic in the number of labeled examples.","PeriodicalId":356443,"journal":{"name":"Sixth International Conference on Data Mining (ICDM'06)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134645601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Dirichlet Aspect Weighting: A Generalized EM Algorithm for Integrating External Data Fields with Semantically Structured Queries by Using Gradient Projection Method Dirichlet方面加权:一种利用梯度投影法集成外部数据域和语义结构化查询的广义EM算法
Sixth International Conference on Data Mining (ICDM'06) Pub Date : 2006-12-18 DOI: 10.1109/ICDM.2006.55
A. Velivelli, Thomas S. Huang
{"title":"Dirichlet Aspect Weighting: A Generalized EM Algorithm for Integrating External Data Fields with Semantically Structured Queries by Using Gradient Projection Method","authors":"A. Velivelli, Thomas S. Huang","doi":"10.1109/ICDM.2006.55","DOIUrl":"https://doi.org/10.1109/ICDM.2006.55","url":null,"abstract":"In this paper we address the problem of document retrieval with semantically structured queries - queries where each term has a tagged field label. We introduce Dirichlet Aspect Weighting model which integrates terms from external databases into the query language model in a bayesian learning framework. For this model, the Dirichlet prior distribution is governed by parameters which depend on the number of fields in the external databases. This model needs additional examples to be augmented to the semantically structured query. These examples are obtained using pseudo relevance feedback. We formulate a loglikelihood function for the Dirichlet Aspect Weighting model and maximize it using a novel Generalized EM algorithm. Comparison of the results of Dirichlet Aspect Weighting model on TREC 2005 Genomics Track dataset with baseline methods using pseudo relevance feedback, while incorporating terms from external databases shows an improvement.","PeriodicalId":356443,"journal":{"name":"Sixth International Conference on Data Mining (ICDM'06)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125322146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Experimental Investigation of Graph Kernels on a Collaborative Recommendation Task 协同推荐任务图核的实验研究
Sixth International Conference on Data Mining (ICDM'06) Pub Date : 2006-12-18 DOI: 10.1109/ICDM.2006.18
François Fouss, Luh Yen, A. Pirotte, M. Saerens
{"title":"An Experimental Investigation of Graph Kernels on a Collaborative Recommendation Task","authors":"François Fouss, Luh Yen, A. Pirotte, M. Saerens","doi":"10.1109/ICDM.2006.18","DOIUrl":"https://doi.org/10.1109/ICDM.2006.18","url":null,"abstract":"This work presents a systematic comparison between seven kernels (or similarity matrices) on a graph, namely the exponential diffusion kernel, the Laplacian diffusion kernel, the von Neumann kernel, the regularized Laplacian kernel, the commute time kernel, and finally the Markov diffusion kernel and the cross-entropy diffusion matrix - both introduced in this paper - on a collaborative recommendation task involving a database. The database is viewed as a graph where elements are represented as nodes and relations as links between nodes. From this graph, seven kernels are computed, leading to a set of meaningful proximity measures between nodes, allowing to answer questions about the structure of the graph under investigation; in particular, recommend items to users. Cross- validation results indicate that a simple nearest-neighbours rule based on the similarity measure provided by the regularized Laplacian, the Markov diffusion and the commute time kernels performs best. We therefore recommend the use of the commute time kernel for computing similarities between elements of a database, for two reasons: (1) it has a nice appealing interpretation in terms of random walks and (2) no parameter needs to be adjusted.","PeriodicalId":356443,"journal":{"name":"Sixth International Conference on Data Mining (ICDM'06)","volume":"156 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132293604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 129
Cluster Analysis of Time-Series Medical Data Based on the Trajectory Representation and Multiscale Comparison Techniques 基于轨迹表示和多尺度比较技术的时间序列医疗数据聚类分析
Sixth International Conference on Data Mining (ICDM'06) Pub Date : 2006-12-18 DOI: 10.1109/ICDM.2006.33
S. Hirano, S. Tsumoto
{"title":"Cluster Analysis of Time-Series Medical Data Based on the Trajectory Representation and Multiscale Comparison Techniques","authors":"S. Hirano, S. Tsumoto","doi":"10.1109/ICDM.2006.33","DOIUrl":"https://doi.org/10.1109/ICDM.2006.33","url":null,"abstract":"This paper presents a cluster analysis method for multidimensional time-series data on clinical laboratory examinations. Our method represents the time series of test results as trajectories in multidimensional space, and compares their structural similarity by using the multiscale comparison technique. It enables us to find the part-to-part correspondences between two trajectories, taking into account the relationships between different tests. The resultant dissimilarity can be further used with clustering algorithms for finding the groups of similar cases. The method was applied to the cluster analysis of Albumin-Platelet data in the chronic hepatitis dataset. The results denonstrated that it could form interesting groups of cases that have high correspondence to the fibrotic stages.","PeriodicalId":356443,"journal":{"name":"Sixth International Conference on Data Mining (ICDM'06)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132962769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
Mining Maximal Generalized Frequent Geographic Patterns with Knowledge Constraints 基于知识约束的最大广义频繁地理模式挖掘
Sixth International Conference on Data Mining (ICDM'06) Pub Date : 2006-12-18 DOI: 10.1109/ICDM.2006.110
V. Bogorny, J. Valiati, S. D. S. Camargo, P. Engel, B. Kuijpers, L. Alvares
{"title":"Mining Maximal Generalized Frequent Geographic Patterns with Knowledge Constraints","authors":"V. Bogorny, J. Valiati, S. D. S. Camargo, P. Engel, B. Kuijpers, L. Alvares","doi":"10.1109/ICDM.2006.110","DOIUrl":"https://doi.org/10.1109/ICDM.2006.110","url":null,"abstract":"In frequent geographic pattern mining a large amount of patterns is well known a priori. This paper presents a novel approach for mining frequent geographic patterns without associations that are previously known as non- interesting. Geographic dependences are eliminated during the frequent set generation using prior knowledge. After the dependence elimination maximal generalized frequent sets are computed to remove redundant frequent sets. Experimental results show a significant reduction of both the number of frequent sets and the computational time for mining maximal frequent geographic patterns.","PeriodicalId":356443,"journal":{"name":"Sixth International Conference on Data Mining (ICDM'06)","volume":"195 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116402953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信
小红书