Seventh IEEE International Conference on Data Mining (ICDM 2007)最新文献

筛选
英文 中文
Incorporating User Provided Constraints into Document Clustering 将用户提供的约束纳入文档聚类
Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-28 DOI: 10.1109/ICDM.2007.67
Yanhua Chen, M. Rege, Ming Dong, Jing Hua
{"title":"Incorporating User Provided Constraints into Document Clustering","authors":"Yanhua Chen, M. Rege, Ming Dong, Jing Hua","doi":"10.1109/ICDM.2007.67","DOIUrl":"https://doi.org/10.1109/ICDM.2007.67","url":null,"abstract":"Document clustering without any prior knowledge or background information is a challenging problem. In this paper, we propose SS-NMF: a semi-supervised non- negative matrix factorization framework for document clustering. In SS-NMF, users are able to provide supervision for document clustering in terms of pairwise constraints on a few documents specifying whether they \"must\" or \"cannot\" be clustered together. Through an iterative algorithm, we perform symmetric tri-factorization of the document- document similarity matrix to infer the document clusters. Theoretically, we show that SS-NMF provides a general framework for semi-supervised clustering and that existing approaches can be considered as special cases of SS-NMF. Through extensive experiments conducted on publicly available data sets, we demonstrate the superior performance of SS-NMF for clustering documents.","PeriodicalId":233758,"journal":{"name":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124766578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 41
Training Conditional Random Fields by Periodic Step Size Adaptation for Large-Scale Text Mining 基于周期步长自适应的大规模文本挖掘条件随机场训练
Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-28 DOI: 10.1109/ICDM.2007.39
Han-Shen Huang, Yu-Ming Chang, Chun-Nan Hsu
{"title":"Training Conditional Random Fields by Periodic Step Size Adaptation for Large-Scale Text Mining","authors":"Han-Shen Huang, Yu-Ming Chang, Chun-Nan Hsu","doi":"10.1109/ICDM.2007.39","DOIUrl":"https://doi.org/10.1109/ICDM.2007.39","url":null,"abstract":"For applications with consecutive incoming training examples, on-line learning has the potential to achieve a likelihood as high as off-line learning without scanning all available training examples and usually has a much smaller memory footprint. To train CRFson-line, this paper presents the Periodic Step size Adaptation (PSA) method to dynamically adjust the learning rates in stochastic gradient descent. We applied our method to three large scale text mining tasks. Experimental results show that PSA outperforms the best off-line algorithm, L-BFGS, by many hundred times, and outperforms the best on-line algorithm, SMD, by an order of magnitude in terms of the number of passes required to scan the training data set.","PeriodicalId":233758,"journal":{"name":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124978381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Sampling for Sequential Pattern Mining: From Static Databases to Data Streams 顺序模式挖掘的抽样:从静态数据库到数据流
Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-28 DOI: 10.1109/ICDM.2007.82
Chedy Raïssi, P. Poncelet
{"title":"Sampling for Sequential Pattern Mining: From Static Databases to Data Streams","authors":"Chedy Raïssi, P. Poncelet","doi":"10.1109/ICDM.2007.82","DOIUrl":"https://doi.org/10.1109/ICDM.2007.82","url":null,"abstract":"Sequential pattern mining is an active field in the domain of knowledge discovery. Recently, with the constant progress in hardware technologies, real-world databases tend to grow larger and the hypothesis that a database can be loaded into main-memory for sequential pattern mining purpose is no longer valid. Furthermore, the new model of data as a continuous and potentially infinite flow, known as data stream model, call for a pre-processing step to ease the mining operations. Since the database size is the most influential factor for mining algorithms we examine the use of sampling over static databases to get approximate mining results with an upper bound on the error rate. Moreover, we extend these sampling analysis and present an algorithm based on reservoir sampling to cope with sequential pattern mining over data streams. We demonstrate with empirical results that our sampling methods are efficient and that sequence mining remains accurate over static databases and data streams.","PeriodicalId":233758,"journal":{"name":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116681543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
The Chosen Few: On Identifying Valuable Patterns 被选中的少数人:鉴别有价值的模式
Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-28 DOI: 10.1109/ICDM.2007.85
Björn Bringmann, Albrecht Zimmermann
{"title":"The Chosen Few: On Identifying Valuable Patterns","authors":"Björn Bringmann, Albrecht Zimmermann","doi":"10.1109/ICDM.2007.85","DOIUrl":"https://doi.org/10.1109/ICDM.2007.85","url":null,"abstract":"Constrained pattern mining extracts patterns based on their individual merit. Usually this results in far more patterns than a human expert or a machine learning technique could make use of. Often different patterns or combinations of patterns cover a similar subset of the examples, thus being redundant and not carrying any new information. To remove the redundant information contained in such pattern sets, we propose a general heuristic approach for selecting a small subset of patterns. We identify several selection techniques for use in this general algorithm and evaluate those on several data sets. The results show that the technique succeeds in severely reducing the number of patterns, while at the same time apparently retaining much of the original information. Additionally the experiments show that reducing the pattern set indeed improves the quality of classification results. Both results show that the approach is very well suited for the goals we aim at.","PeriodicalId":233758,"journal":{"name":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126959275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 100
ORIGAMI: Mining Representative Orthogonal Graph Patterns ORIGAMI:挖掘具有代表性的正交图模式
Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-28 DOI: 10.1109/ICDM.2007.45
M. Hasan, V. Chaoji, Saeed Salem, J. Besson, Mohammed J. Zaki
{"title":"ORIGAMI: Mining Representative Orthogonal Graph Patterns","authors":"M. Hasan, V. Chaoji, Saeed Salem, J. Besson, Mohammed J. Zaki","doi":"10.1109/ICDM.2007.45","DOIUrl":"https://doi.org/10.1109/ICDM.2007.45","url":null,"abstract":"In this paper, we introduce the concept of alpha-orthogonal patterns to mine a representative set of graph patterns. Intuitively, two graph patterns are alpha-orthogonal if their similarity is bounded above by alpha. Each alpha-orthogonal pattern is also a representative for those patterns that are at least beta similar to it. Given user defined alpha, beta isin [0,1], the goal is to mine an alpha-orthogonal, beta-representative set that minimizes the set of unrepresented patterns. We present ORIGAMI, an effective algorithm for mining the set of representative orthogonal patterns. ORIGAMI first uses a randomized algorithm to randomly traverse the pattern space, seeking previously unexplored regions, to return a set of maximal patterns. ORIGAMI then extracts an alpha-orthogonal, beta-representative set from the mined maximal patterns. We show the effectiveness of our algorithm on a number of real and synthetic datasets. In particular, we show that our method is able to extract high quality patterns even in cases where existing enumerative graph mining methods fail to do so.","PeriodicalId":233758,"journal":{"name":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","volume":"04 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129040680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 84
A Support Vector Approach to Censored Targets 删减目标的支持向量方法
Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-28 DOI: 10.1109/ICDM.2007.93
Pannagadatta K. Shivaswamy, Wei Chu, Martin Jansche
{"title":"A Support Vector Approach to Censored Targets","authors":"Pannagadatta K. Shivaswamy, Wei Chu, Martin Jansche","doi":"10.1109/ICDM.2007.93","DOIUrl":"https://doi.org/10.1109/ICDM.2007.93","url":null,"abstract":"Censored targets, such as the time to events in survival analysis, can generally be represented by intervals on the real line. In this paper, we propose a novel support vector technique (named SVCR) for regression on censored targets. SVCR inherits the strengths of support vector methods, such as a globally optimal solution by convex programming, fast training speed and strong generalization capacity. In contrast to ranking approaches to survival analysis, our approach is able not only to achieve superior ordering performance, but also to predict the survival time very well. Experiments show a significant performance improvement when the majority of the training data is censored. Experimental results on several survival analysis datasets demonstrate that SVCR is very competitive against classical survival analysis models.","PeriodicalId":233758,"journal":{"name":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","volume":"53 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124199399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 143
Multilevel Belief Propagation for Fast Inference on Markov Random Fields 马尔可夫随机场上快速推理的多级信念传播
Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-28 DOI: 10.1109/ICDM.2007.9
L. Xiong, Fei Wang, Changshui Zhang
{"title":"Multilevel Belief Propagation for Fast Inference on Markov Random Fields","authors":"L. Xiong, Fei Wang, Changshui Zhang","doi":"10.1109/ICDM.2007.9","DOIUrl":"https://doi.org/10.1109/ICDM.2007.9","url":null,"abstract":"Graph-based inference plays an important role in many mining and learning tasks. Among all the solvers for this problem, belief propagation (BP) provides a general and efficient way to derive approximate solutions. However, for large scale graphs the computational cost of BP is still demanding. In this paper, we propose a multilevel algorithm to accelerate belief propagation on Markov Random Fields (MRF). First, we coarsen the original graph to get a smaller one. Then, BP is applied on the new graph to get a coarse result. Finally the coarse solution is efficiently refined back to derive the original solution. Unlike traditional multi- resolution approaches, our method features adaptive coarsening and efficient refinement. The above process can be recursively applied to reduce the computational cost remarkably. We theoretically justify the feasibility of our method on Gaussian MRFs, and empirically show that it is also effectual on discrete MRFs. The effectiveness of our method is verified in experiments on various inference tasks.","PeriodicalId":233758,"journal":{"name":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","volume":"15 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113970731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Failure Prediction in IBM BlueGene/L Event Logs IBM BlueGene/L事件日志中的故障预测
Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-28 DOI: 10.1109/ICDM.2007.46
Yinglung Liang, Yanyong Zhang, Hui Xiong, R. Sahoo
{"title":"Failure Prediction in IBM BlueGene/L Event Logs","authors":"Yinglung Liang, Yanyong Zhang, Hui Xiong, R. Sahoo","doi":"10.1109/ICDM.2007.46","DOIUrl":"https://doi.org/10.1109/ICDM.2007.46","url":null,"abstract":"Frequent failures are becoming a serious concern to the community of high-end computing, especially when the applications and the underlying systems rapidly grow in size and complexity. In order to develop effective fault-tolerant strategies, there is a critical need to predict failure events. To this end, we have collected detailed event logs from IBM BlueGene/L, which has 128 K processors, and is currently the fastest supercomputer in the world. In this study, we first show how the event records can be converted into a data set that is appropriate for running classification techniques. Then we apply classifiers on the data, including RIPPER (a rule-based classifier), Support Vector Machines (SVMs), a traditional Nearest Neighbor method, and a customized Nearest Neighbor method. We show that the customized nearest neighbor approach can outperform RIPPER and SVMs in terms of both coverage and precision. The results suggest that the customized nearest neighbor approach can be used to alleviate the impact of failures.","PeriodicalId":233758,"journal":{"name":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131287479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 240
Cocktail Ensemble for Regression 回归的鸡尾酒集合
Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-28 DOI: 10.1109/ICDM.2007.60
Yang Yu, Zhi-Hua Zhou, K. Ting
{"title":"Cocktail Ensemble for Regression","authors":"Yang Yu, Zhi-Hua Zhou, K. Ting","doi":"10.1109/ICDM.2007.60","DOIUrl":"https://doi.org/10.1109/ICDM.2007.60","url":null,"abstract":"This paper is motivated to improve the performance of individual ensembles using a hybrid mechanism in the regression setting. Based on an error-ambiguity decomposition, we formally analyze the optimal linear combination of two base ensembles, which is then extended to multiple individual ensembles via pairwise combinations. The Cocktail ensemble approach is proposed based on this analysis. Experiments over a broad range of data sets show that the proposed approach outperforms the individual ensembles, two other methods of ensemble combination, and two state-of-the-art regression approaches.","PeriodicalId":233758,"journal":{"name":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","volume":"92 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125200734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Prism: A Primal-Encoding Approach for Frequent Sequence Mining Prism:一种用于频繁序列挖掘的原始编码方法
Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-28 DOI: 10.1109/ICDM.2007.33
K. Gouda, M. Hassaan, Mohammed J. Zaki
{"title":"Prism: A Primal-Encoding Approach for Frequent Sequence Mining","authors":"K. Gouda, M. Hassaan, Mohammed J. Zaki","doi":"10.1109/ICDM.2007.33","DOIUrl":"https://doi.org/10.1109/ICDM.2007.33","url":null,"abstract":"Sequence mining is one of the fundamental data mining tasks. In this paper we present a novel approach called Prism, for mining frequent sequences. Prism utilizes a vertical approach for enumeration and support counting, based on the novel notion o/prime block encoding, which in turn is based on prime factorization theory. Via an extensive evaluation on both synthetic and real datasets, we show that Prism outperforms popular sequence mining methods like SPADE [10], PrefixSpan [6] and SPAM [2], by an order of magnitude or more.","PeriodicalId":233758,"journal":{"name":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","volume":"136 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134074155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 49
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信