Sixth International Conference on Data Mining (ICDM'06)最新文献

筛选
英文 中文
Mining Maximal Generalized Frequent Geographic Patterns with Knowledge Constraints 基于知识约束的最大广义频繁地理模式挖掘
Sixth International Conference on Data Mining (ICDM'06) Pub Date : 2006-12-18 DOI: 10.1109/ICDM.2006.110
V. Bogorny, J. Valiati, S. D. S. Camargo, P. Engel, B. Kuijpers, L. Alvares
{"title":"Mining Maximal Generalized Frequent Geographic Patterns with Knowledge Constraints","authors":"V. Bogorny, J. Valiati, S. D. S. Camargo, P. Engel, B. Kuijpers, L. Alvares","doi":"10.1109/ICDM.2006.110","DOIUrl":"https://doi.org/10.1109/ICDM.2006.110","url":null,"abstract":"In frequent geographic pattern mining a large amount of patterns is well known a priori. This paper presents a novel approach for mining frequent geographic patterns without associations that are previously known as non- interesting. Geographic dependences are eliminated during the frequent set generation using prior knowledge. After the dependence elimination maximal generalized frequent sets are computed to remove redundant frequent sets. Experimental results show a significant reduction of both the number of frequent sets and the computational time for mining maximal frequent geographic patterns.","PeriodicalId":356443,"journal":{"name":"Sixth International Conference on Data Mining (ICDM'06)","volume":"195 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116402953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Dirichlet Aspect Weighting: A Generalized EM Algorithm for Integrating External Data Fields with Semantically Structured Queries by Using Gradient Projection Method Dirichlet方面加权:一种利用梯度投影法集成外部数据域和语义结构化查询的广义EM算法
Sixth International Conference on Data Mining (ICDM'06) Pub Date : 2006-12-18 DOI: 10.1109/ICDM.2006.55
A. Velivelli, Thomas S. Huang
{"title":"Dirichlet Aspect Weighting: A Generalized EM Algorithm for Integrating External Data Fields with Semantically Structured Queries by Using Gradient Projection Method","authors":"A. Velivelli, Thomas S. Huang","doi":"10.1109/ICDM.2006.55","DOIUrl":"https://doi.org/10.1109/ICDM.2006.55","url":null,"abstract":"In this paper we address the problem of document retrieval with semantically structured queries - queries where each term has a tagged field label. We introduce Dirichlet Aspect Weighting model which integrates terms from external databases into the query language model in a bayesian learning framework. For this model, the Dirichlet prior distribution is governed by parameters which depend on the number of fields in the external databases. This model needs additional examples to be augmented to the semantically structured query. These examples are obtained using pseudo relevance feedback. We formulate a loglikelihood function for the Dirichlet Aspect Weighting model and maximize it using a novel Generalized EM algorithm. Comparison of the results of Dirichlet Aspect Weighting model on TREC 2005 Genomics Track dataset with baseline methods using pseudo relevance feedback, while incorporating terms from external databases shows an improvement.","PeriodicalId":356443,"journal":{"name":"Sixth International Conference on Data Mining (ICDM'06)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125322146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Belief Propagation in Large, Highly Connected Graphs for 3D Part-Based Object Recognition 基于三维零件的物体识别中大型高连通图的信念传播
Sixth International Conference on Data Mining (ICDM'06) Pub Date : 2006-12-18 DOI: 10.1109/ICDM.2006.26
F. DiMaio, J. Shavlik
{"title":"Belief Propagation in Large, Highly Connected Graphs for 3D Part-Based Object Recognition","authors":"F. DiMaio, J. Shavlik","doi":"10.1109/ICDM.2006.26","DOIUrl":"https://doi.org/10.1109/ICDM.2006.26","url":null,"abstract":"We describe a part-based object-recognition framework, specialized to mining complex 3D objects from detailed 3D images. Objects are modeled as a collection of parts together with a pairwise potential function. An efficient inference algorithm - based on belief propagation (BP) -finds the optimal layout of parts, given some input image. We introduce AggBP, a message aggregation scheme for BP, in which groups of messages are approximated as a single message. For objects consisting of N parts, we reduce CPU time and memory requirements from O(N2) to O(N). We apply AggBP on synthetic data as well as a real-world task identifying protein fragments in three-dimensional images. These experiments show that our improvements result in minimal loss in accuracy in significantly less time.","PeriodicalId":356443,"journal":{"name":"Sixth International Conference on Data Mining (ICDM'06)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124130184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
An Experimental Investigation of Graph Kernels on a Collaborative Recommendation Task 协同推荐任务图核的实验研究
Sixth International Conference on Data Mining (ICDM'06) Pub Date : 2006-12-18 DOI: 10.1109/ICDM.2006.18
François Fouss, Luh Yen, A. Pirotte, M. Saerens
{"title":"An Experimental Investigation of Graph Kernels on a Collaborative Recommendation Task","authors":"François Fouss, Luh Yen, A. Pirotte, M. Saerens","doi":"10.1109/ICDM.2006.18","DOIUrl":"https://doi.org/10.1109/ICDM.2006.18","url":null,"abstract":"This work presents a systematic comparison between seven kernels (or similarity matrices) on a graph, namely the exponential diffusion kernel, the Laplacian diffusion kernel, the von Neumann kernel, the regularized Laplacian kernel, the commute time kernel, and finally the Markov diffusion kernel and the cross-entropy diffusion matrix - both introduced in this paper - on a collaborative recommendation task involving a database. The database is viewed as a graph where elements are represented as nodes and relations as links between nodes. From this graph, seven kernels are computed, leading to a set of meaningful proximity measures between nodes, allowing to answer questions about the structure of the graph under investigation; in particular, recommend items to users. Cross- validation results indicate that a simple nearest-neighbours rule based on the similarity measure provided by the regularized Laplacian, the Markov diffusion and the commute time kernels performs best. We therefore recommend the use of the commute time kernel for computing similarities between elements of a database, for two reasons: (1) it has a nice appealing interpretation in terms of random walks and (2) no parameter needs to be adjusted.","PeriodicalId":356443,"journal":{"name":"Sixth International Conference on Data Mining (ICDM'06)","volume":"156 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132293604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 129
Diverse Topic Phrase Extraction through Latent Semantic Analysis 基于潜在语义分析的多主题短语提取
Sixth International Conference on Data Mining (ICDM'06) Pub Date : 2006-12-18 DOI: 10.1109/ICDM.2006.61
Jilin Chen, Jun Yan, Benyu Zhang, Qiang Yang, Zheng Chen
{"title":"Diverse Topic Phrase Extraction through Latent Semantic Analysis","authors":"Jilin Chen, Jun Yan, Benyu Zhang, Qiang Yang, Zheng Chen","doi":"10.1109/ICDM.2006.61","DOIUrl":"https://doi.org/10.1109/ICDM.2006.61","url":null,"abstract":"We propose a novel algorithm for extracting diverse topic phrases in order to provide summary for large corpora. Previous works often ignore the importance of diversity and thus extract phrases crowded on some hot topics while failing to cover other less obvious but important topics. We solve this problem through document re-weighting and phrase diversification by using latent semantic analysis (LSA). Experiments on various datasets show that our new algorithm can improve relevance as well as diversity over different topics for topic phrase extraction problems.","PeriodicalId":356443,"journal":{"name":"Sixth International Conference on Data Mining (ICDM'06)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116645719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Cluster Analysis of Time-Series Medical Data Based on the Trajectory Representation and Multiscale Comparison Techniques 基于轨迹表示和多尺度比较技术的时间序列医疗数据聚类分析
Sixth International Conference on Data Mining (ICDM'06) Pub Date : 2006-12-18 DOI: 10.1109/ICDM.2006.33
S. Hirano, S. Tsumoto
{"title":"Cluster Analysis of Time-Series Medical Data Based on the Trajectory Representation and Multiscale Comparison Techniques","authors":"S. Hirano, S. Tsumoto","doi":"10.1109/ICDM.2006.33","DOIUrl":"https://doi.org/10.1109/ICDM.2006.33","url":null,"abstract":"This paper presents a cluster analysis method for multidimensional time-series data on clinical laboratory examinations. Our method represents the time series of test results as trajectories in multidimensional space, and compares their structural similarity by using the multiscale comparison technique. It enables us to find the part-to-part correspondences between two trajectories, taking into account the relationships between different tests. The resultant dissimilarity can be further used with clustering algorithms for finding the groups of similar cases. The method was applied to the cluster analysis of Albumin-Platelet data in the chronic hepatitis dataset. The results denonstrated that it could form interesting groups of cases that have high correspondence to the fibrotic stages.","PeriodicalId":356443,"journal":{"name":"Sixth International Conference on Data Mining (ICDM'06)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132962769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
Turning Clusters into Patterns: Rectangle-Based Discriminative Data Description 将聚类转化为模式:基于矩形的判别数据描述
Sixth International Conference on Data Mining (ICDM'06) Pub Date : 2006-12-18 DOI: 10.1109/ICDM.2006.163
Byron J. Gao, M. Ester
{"title":"Turning Clusters into Patterns: Rectangle-Based Discriminative Data Description","authors":"Byron J. Gao, M. Ester","doi":"10.1109/ICDM.2006.163","DOIUrl":"https://doi.org/10.1109/ICDM.2006.163","url":null,"abstract":"The ultimate goal of data mining is to extract knowledge from massive data. Knowledge is ideally represented as human-comprehensible patterns from which end-users can gain intuitions and insights. Yet not all data mining methods produce such readily understandable knowledge, e.g., most clustering algorithms output sets of points as clusters. In this paper, we perform a systematic study of cluster description that generates interpretable patterns from clusters. We introduce and analyze novel description formats leading to more expressive power, motivate and define novel description problems specifying different trade-offs between interpretability and accuracy. We also present effective heuristic algorithms together with their empirical evaluations.","PeriodicalId":356443,"journal":{"name":"Sixth International Conference on Data Mining (ICDM'06)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128742619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Semantic Smoothing for Model-based Document Clustering 基于模型的文档聚类的语义平滑
Sixth International Conference on Data Mining (ICDM'06) Pub Date : 2006-12-18 DOI: 10.1109/ICDM.2006.142
Xiaodan Zhang, Xiaohua Zhou, Xiaohua Hu
{"title":"Semantic Smoothing for Model-based Document Clustering","authors":"Xiaodan Zhang, Xiaohua Zhou, Xiaohua Hu","doi":"10.1109/ICDM.2006.142","DOIUrl":"https://doi.org/10.1109/ICDM.2006.142","url":null,"abstract":"A document is often full of class-independent \"general\" words and short of class-specific \"core \" words, which leads to the difficulty of document clustering. We argue that both problems will be relieved after suitable smoothing of document models in agglomerative approaches and of cluster models in partitional approaches, and hence improve clustering quality. To the best of our knowledge, most model-based clustering approaches use Laplacian smoothing to prevent zero probability while most similarity-based approaches employ the heuristic TF*IDF scheme to discount the effect of \"general\" words. Inspired by a series of statistical translation language model for text retrieval, we propose in this paper a novel smoothing method referred to as context-sensitive semantic smoothing for document clustering purpose. The comparative experiment on three datasets shows that model-based clustering approaches with semantic smoothing is effective in improving cluster quality.","PeriodicalId":356443,"journal":{"name":"Sixth International Conference on Data Mining (ICDM'06)","volume":"373 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129081431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Adaptive Kernel Principal Component Analysis with Unsupervised Learning of Kernels 基于核无监督学习的自适应核主成分分析
Sixth International Conference on Data Mining (ICDM'06) Pub Date : 2006-12-18 DOI: 10.1109/ICDM.2006.14
Daoqiang Zhang, Zhi-Hua Zhou, Songcan Chen
{"title":"Adaptive Kernel Principal Component Analysis with Unsupervised Learning of Kernels","authors":"Daoqiang Zhang, Zhi-Hua Zhou, Songcan Chen","doi":"10.1109/ICDM.2006.14","DOIUrl":"https://doi.org/10.1109/ICDM.2006.14","url":null,"abstract":"Choosing an appropriate kernel is one of the key problems in kernel-based methods. Most existing kernel selection methods require that the class labels of the training examples are known. In this paper, we propose an adaptive kernel selection method for kernel principal component analysis, which can effectively learn the kernels when the class labels of the training examples are not available. By iteratively optimizing a novel criterion, the proposed method can achieve nonlinear feature extraction and unsupervised kernel learning simultaneously. Moreover, a non-iterative approximate algorithm is developed. The effectiveness of the proposed algorithms are validated on UCI datasets and the COIL-20 object recognition database.","PeriodicalId":356443,"journal":{"name":"Sixth International Conference on Data Mining (ICDM'06)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116845047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
How Bayesians Debug 贝叶斯算法是如何调试的
Sixth International Conference on Data Mining (ICDM'06) Pub Date : 2006-12-18 DOI: 10.1109/ICDM.2006.83
Chao Liu, Zeng Lian, Jiawei Han
{"title":"How Bayesians Debug","authors":"Chao Liu, Zeng Lian, Jiawei Han","doi":"10.1109/ICDM.2006.83","DOIUrl":"https://doi.org/10.1109/ICDM.2006.83","url":null,"abstract":"Manual debugging is expensive. And the high cost has motivated extensive research on automated fault localization in both software engineering and data mining communities. Fault localization aims at automatically locating likely fault locations, and hence assists manual debugging. A number of fault localization algorithms have been developed in recent years, which prove effective when multiple failing and passing cases are available. However, we notice what is more commonly encountered in practice is the two-sample debugging problem, where only one failing and one passing cases are available. This problem has been either overlooked or insufficiently tackled in previous studies. In this paper, we develop a new fault localization algorithm, named BayesDebug, which simulates some manual debugging principles through a Bayesian approach. Different from existing approaches that base fault analysis on multiple passing and failing cases, BayesDebug only requires one passing and one failing cases. We reason about why BayesDebug fits the two- sample debugging problem and why other approaches do not. Finally, an experiment with a real-world program grep-2.2 is conducted, which exemplifies the effectiveness of BayesDebug.","PeriodicalId":356443,"journal":{"name":"Sixth International Conference on Data Mining (ICDM'06)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115666617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信