2008 Eighth IEEE International Conference on Data Mining最新文献

筛选
英文 中文
A Randomized Approach for Approximating the Number of Frequent Sets 一种近似频繁集数目的随机化方法
2008 Eighth IEEE International Conference on Data Mining Pub Date : 2008-12-15 DOI: 10.1109/ICDM.2008.85
Mario Boley, H. Grosskreutz
{"title":"A Randomized Approach for Approximating the Number of Frequent Sets","authors":"Mario Boley, H. Grosskreutz","doi":"10.1109/ICDM.2008.85","DOIUrl":"https://doi.org/10.1109/ICDM.2008.85","url":null,"abstract":"We investigate the problem of counting the number of frequent (item)sets - a problem known to be intractable in terms of an exact polynomial time computation. In this paper, we show that it is in general also hard to approximate. Subsequently, a randomized counting algorithm is developed using the Markov chain Monte Carlo method. While for general inputs an exponential running time is needed in order to guarantee a certain approximation bound, we empirically show that the algorithm still has the desired accuracy on real-world datasets when its running time is capped polynomially.","PeriodicalId":252958,"journal":{"name":"2008 Eighth IEEE International Conference on Data Mining","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122934017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
SeqStream: Mining Closed Sequential Patterns over Stream Sliding Windows SeqStream:在流滑动窗口上挖掘封闭的顺序模式
2008 Eighth IEEE International Conference on Data Mining Pub Date : 2008-12-15 DOI: 10.1109/ICDM.2008.36
Lei Chang, Tengjiao Wang, Dongqing Yang, Hua Luan
{"title":"SeqStream: Mining Closed Sequential Patterns over Stream Sliding Windows","authors":"Lei Chang, Tengjiao Wang, Dongqing Yang, Hua Luan","doi":"10.1109/ICDM.2008.36","DOIUrl":"https://doi.org/10.1109/ICDM.2008.36","url":null,"abstract":"Previous studies have shown mining closed patterns provides more benefits than mining the complete set of frequent patterns, since closed pattern mining leads to more compact results and more efficient algorithms. It is quite useful in a data stream environment where memory and computation power are major concerns. This paper studies the problem of mining closed sequential patterns over data stream sliding windows. A synopsis structure IST (Inverse Closed Sequence Tree) is designed to keep inverse closed sequential patterns in current window. An efficient algorithm SeqStream is developed to mine closed sequential patterns in stream windows incrementally, and various novel strategies are adopted in SeqStream to prune search space aggressively. Extensive experiments on both real and synthetic data sets show that SeqStream outperforms PrefixSpan, CloSpan and BIDE by a factor of about one to two orders of magnitude.","PeriodicalId":252958,"journal":{"name":"2008 Eighth IEEE International Conference on Data Mining","volume":"27 10","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113931693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 43
Text Cube: Computing IR Measures for Multidimensional Text Database Analysis 文本立方体:计算多维文本数据库分析的IR度量
2008 Eighth IEEE International Conference on Data Mining Pub Date : 2008-12-15 DOI: 10.1109/ICDM.2008.135
C. Lin, Bolin Ding, Jiawei Han, Feida Zhu, Bo Zhao
{"title":"Text Cube: Computing IR Measures for Multidimensional Text Database Analysis","authors":"C. Lin, Bolin Ding, Jiawei Han, Feida Zhu, Bo Zhao","doi":"10.1109/ICDM.2008.135","DOIUrl":"https://doi.org/10.1109/ICDM.2008.135","url":null,"abstract":"Since Jim Gray introduced the concept of rdquodata cuberdquo in 1997, data cube, associated with online analytical processing (OLAP), has become a driving engine in data warehouse industry. Because the boom of Internet has given rise to an ever increasing amount of text data associated with other multidimensional information, it is natural to propose a data cube model that integrates the power of traditional OLAP and IR techniques for text. In this paper, we propose a text-cube model on multidimensional text database and study effective OLAP over such data. Two kinds of hierarchies are distinguishable inside: dimensional hierarchy and term hierarchy. By incorporating these hierarchies, we conduct systematic studies on efficient text-cube implementation, OLAP execution and query processing. Our performance study shows the high promise of our methods.","PeriodicalId":252958,"journal":{"name":"2008 Eighth IEEE International Conference on Data Mining","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114241065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 124
Frequent Subgraph Retrieval in Geometric Graph Databases 几何图数据库中的频繁子图检索
2008 Eighth IEEE International Conference on Data Mining Pub Date : 2008-12-15 DOI: 10.1109/ICDM.2008.38
Sebastian Nowozin, K. Tsuda
{"title":"Frequent Subgraph Retrieval in Geometric Graph Databases","authors":"Sebastian Nowozin, K. Tsuda","doi":"10.1109/ICDM.2008.38","DOIUrl":"https://doi.org/10.1109/ICDM.2008.38","url":null,"abstract":"Discovery of knowledge from geometric graph databases is of particular importance in chemistry and biology, because chemical compounds and proteins are represented as graphs with 3D geometric coordinates. In such applications, scientists are not interested in the statistics of the whole database. Instead they need information about a novel drug candidate or protein at hand, represented as a query graph. We propose a polynomial-delay algorithm for geometric frequent subgraph retrieval. It enumerates all subgraphs of a single given query graph which are frequent geometric epsi-subgraphs under the entire class of rigid geometric transformations in a database. By using geometric epsi-subgraphs, we achieve tolerance against variations in geometry. We compare the proposed algorithm to gSpan on chemical compound data, and we show that for a given minimum support the total number of frequent patterns is substantially limited by requiring geometric matching. Although the computation time per pattern is larger than for non-geometric graph mining, the total time is within a reasonable level even for small minimum support.","PeriodicalId":252958,"journal":{"name":"2008 Eighth IEEE International Conference on Data Mining","volume":"21 5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123697518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Document-Word Co-regularization for Semi-supervised Sentiment Analysis 半监督情感分析的文档-词协同正则化
2008 Eighth IEEE International Conference on Data Mining Pub Date : 2008-12-15 DOI: 10.1109/ICDM.2008.113
Vikas Sindhwani, Prem Melville
{"title":"Document-Word Co-regularization for Semi-supervised Sentiment Analysis","authors":"Vikas Sindhwani, Prem Melville","doi":"10.1109/ICDM.2008.113","DOIUrl":"https://doi.org/10.1109/ICDM.2008.113","url":null,"abstract":"The goal of sentiment prediction is to automatically identify whether a given piece of text expresses positive or negative opinion towards a topic of interest. One can pose sentiment prediction as a standard text categorization problem, but gathering labeled data turns out to be a bottleneck. Fortunately, background knowledge is often available in the form of prior information about the sentiment polarity of words in a lexicon. Moreover, in many applications abundant unlabeled data is also available. In this paper, we propose a novel semi-supervised sentiment prediction algorithm that utilizes lexical prior knowledge in conjunction with unlabeled examples. Our method is based on joint sentiment analysis of documents and words based on a bipartite graph representation of the data. We present an empirical study on a diverse collection of sentiment prediction problems which confirms that our semi-supervised lexical models significantly outperform purely supervised and competing semi-supervised techniques.","PeriodicalId":252958,"journal":{"name":"2008 Eighth IEEE International Conference on Data Mining","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129078938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 220
A Generative Probabilistic Model for Multi-label Classification 多标签分类的生成概率模型
2008 Eighth IEEE International Conference on Data Mining Pub Date : 2008-12-15 DOI: 10.1109/ICDM.2008.86
Hongning Wang, Minlie Huang, Xiaoyan Zhu
{"title":"A Generative Probabilistic Model for Multi-label Classification","authors":"Hongning Wang, Minlie Huang, Xiaoyan Zhu","doi":"10.1109/ICDM.2008.86","DOIUrl":"https://doi.org/10.1109/ICDM.2008.86","url":null,"abstract":"Traditional discriminative classification method makes little attempt to reveal the probabilistic structure and the correlation within both input and output spaces. In the scenario of multi-label classification, most of the classifiers simply assume the predefined classes are independently distributed, which would definitely hinder the classification performance when there are intrinsic correlations between the classes. In this article, we propose a generative probabilistic model, the Correlated Labeling Model (CoL Model), to formulate the correlation between different classes. The CoL model is presented to capture the correlation between classes and the underlying structures via the latent random variables in a supervised manner. We develop a variational procedure to approximate the posterior distribution and employ the EM algorithm for the empirical Bayes parameter estimation. In our evaluations, the proposed model achieved promising results on various data sets.","PeriodicalId":252958,"journal":{"name":"2008 Eighth IEEE International Conference on Data Mining","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130213689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 45
Computational Discovery of Motifs Using Hierarchical Clustering Techniques 基于层次聚类技术的基元计算发现
2008 Eighth IEEE International Conference on Data Mining Pub Date : 2008-12-15 DOI: 10.1109/ICDM.2008.21
Dianhui Wang, Nung Kion Lee
{"title":"Computational Discovery of Motifs Using Hierarchical Clustering Techniques","authors":"Dianhui Wang, Nung Kion Lee","doi":"10.1109/ICDM.2008.21","DOIUrl":"https://doi.org/10.1109/ICDM.2008.21","url":null,"abstract":"Discovery of motifs plays a key role in understanding gene regulation in organisms. Existing tools for motif discovery demonstrate some weaknesses in dealing with reliability and scalability. Therefore, development of advanced algorithms for resolving this problem will be useful. This paper aims to develop data mining techniques for discovering motifs. A mismatch based hierarchical clustering algorithm is proposed in this paper, where three heuristic rules for classifying clusters and a post-processing for ranking and refining the clusters are employed in the algorithm. Our algorithm is evaluated using two sets of DNA sequences with comparisons. Results demonstrate that the proposed techniques in this paper outperform MEME, AlignACE and SOMBRERO for most of the testing datasets.","PeriodicalId":252958,"journal":{"name":"2008 Eighth IEEE International Conference on Data Mining","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130159659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Learning by Propagability 可传播性学习
2008 Eighth IEEE International Conference on Data Mining Pub Date : 2008-12-15 DOI: 10.1109/ICDM.2008.53
Bingbing Ni, Shuicheng Yan, A. Kassim, L. Cheong
{"title":"Learning by Propagability","authors":"Bingbing Ni, Shuicheng Yan, A. Kassim, L. Cheong","doi":"10.1109/ICDM.2008.53","DOIUrl":"https://doi.org/10.1109/ICDM.2008.53","url":null,"abstract":"In this paper, we present a novel feature extraction framework, called learning by propagability. The whole learning process is driven by the philosophy that the data labels and optimal feature representation can constitute a harmonic system, namely, the data labels are invariant with respect to the propagation on the similarity-graph constructed by the optimal feature representation. Based on this philosophy, a unified formulation for learning by propagability is proposed for both supervised and semi-supervised configurations. Specifically, this formulation offers the semi-supervised learning two characteristics: 1) unlike conventional semi-supervised learning algorithms which mostly include at least two parameters, this formulation is parameter-free; and 2) the formulation unifies the label propagation and optimal representation pursuing, and thus the label propagation is enhanced by benefiting from the graph constructed with the derived optimal representation instead of the original representation. Extensive experiments on UCI toy data, handwritten digit recognition, and face recognition all validate the effectiveness of our proposed learning framework compared with the state-of-the-art methods for feature extraction and semi-supervised learning.","PeriodicalId":252958,"journal":{"name":"2008 Eighth IEEE International Conference on Data Mining","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117121260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Cost-Sensitive Parsimonious Linear Regression 代价敏感的简约线性回归
2008 Eighth IEEE International Conference on Data Mining Pub Date : 2008-12-15 DOI: 10.1109/ICDM.2008.76
R. Goetschalckx, K. Driessens, S. Sanner
{"title":"Cost-Sensitive Parsimonious Linear Regression","authors":"R. Goetschalckx, K. Driessens, S. Sanner","doi":"10.1109/ICDM.2008.76","DOIUrl":"https://doi.org/10.1109/ICDM.2008.76","url":null,"abstract":"We examine linear regression problems where some features may only be observable at a cost (e.g., in medical domains where features may correspond to diagnostic tests that take time and costs money). This can be important in the context of data mining, in order to obtain the best predictions from the data on a limited cost budget. We define a parsimonious linear regression objective criterion that jointly minimizes prediction error and feature cost. We modify least angle regression algorithms commonly used for sparse linear regression to produce the ParLiR algorithm, which not only provides an efficient and parsimonious solution as we demonstrate empirically, but it also provides formal guarantees that we prove theoretically.","PeriodicalId":252958,"journal":{"name":"2008 Eighth IEEE International Conference on Data Mining","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122761664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Similarity Learning for Nearest Neighbor Classification 基于相似性学习的最近邻分类
2008 Eighth IEEE International Conference on Data Mining Pub Date : 2008-12-15 DOI: 10.1109/ICDM.2008.81
A. M. Qamar, Éric Gaussier, J. Chevallet, Joo-Hwee Lim
{"title":"Similarity Learning for Nearest Neighbor Classification","authors":"A. M. Qamar, Éric Gaussier, J. Chevallet, Joo-Hwee Lim","doi":"10.1109/ICDM.2008.81","DOIUrl":"https://doi.org/10.1109/ICDM.2008.81","url":null,"abstract":"In this paper, we propose an algorithm for learning a general class of similarity measures for kNN classification. This class encompasses, among others, the standard cosine measure, as well as the Dice and Jaccard coefficients. The algorithm we propose is an extension of the voted perceptron algorithm and allows one to learn different types of similarity functions (either based on diagonal, symmetric or asymmetric similarity matrices). The results we obtained show that learning similarity measures yields significant improvements on several collections, for two prediction rules: the standard kNN rule, which was our primary goal, and a symmetric version of it.","PeriodicalId":252958,"journal":{"name":"2008 Eighth IEEE International Conference on Data Mining","volume":"24 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116001428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 63
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信