F. Martínez-Álvarez, A. T. Lora, José Cristóbal Riquelme Santos, J. Aguilar-Ruiz
{"title":"LBF: A Labeled-Based Forecasting Algorithm and Its Application to Electricity Price Time Series","authors":"F. Martínez-Álvarez, A. T. Lora, José Cristóbal Riquelme Santos, J. Aguilar-Ruiz","doi":"10.1109/ICDM.2008.129","DOIUrl":"https://doi.org/10.1109/ICDM.2008.129","url":null,"abstract":"A new approach is presented in this work with the aim of predicting time series behaviors. A previous labeling of the samples is obtained utilizing clustering techniques and the forecasting is applied using the information provided by the clustering. Thus, the whole data set is discretized with the labels assigned to each data point and the main novelty is that only these labels are used to predict the future behavior of the time series, avoiding using the real values of the time series until the process ends. The results returned by the algorithm, however, are not labels but the nominal value of the point that is required to be predicted. The algorithm based on labeled (LBF) has been tested in several energy-related time series and a notable improvement in the prediction has been achieved.","PeriodicalId":252958,"journal":{"name":"2008 Eighth IEEE International Conference on Data Mining","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125379720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ReDSOM: Relative Density Visualization of Temporal Changes in Cluster Structures Using Self-Organizing Maps","authors":"Denny, Graham J. Williams, P. Christen","doi":"10.1109/ICDM.2008.34","DOIUrl":"https://doi.org/10.1109/ICDM.2008.34","url":null,"abstract":"We introduce a self-organizing map (SOM) based visualization method that compares cluster structures in temporal datasets using relative density SOM (ReDSOM) visualization. Our method, combined with a distance matrix-based visualization, is capable of visually identifying emerging clusters, disappearing clusters, enlarging clusters, contracting clusters, the shifting of cluster centroids, and changes in cluster density. For example, when a region in a SOM becomes significantly more dense compared to an earlier SOM, and well separated from other regions, then the new region can be said to represent a new cluster. The capabilities of ReDSOM are demonstrated using synthetic datasets, as well as real-life datasets from the World Bank and the Australian Taxation Office. The results on the real-life datasets demonstrate that changes identified interactively can be related to actual changes. The identification of such cluster changes is important in many contexts, including the exploration of changes in population behavior in the context of compliance and fraud in taxation.","PeriodicalId":252958,"journal":{"name":"2008 Eighth IEEE International Conference on Data Mining","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114555839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shohei Hido, Yuta Tsuboi, H. Kashima, Masashi Sugiyama, T. Kanamori
{"title":"Inlier-Based Outlier Detection via Direct Density Ratio Estimation","authors":"Shohei Hido, Yuta Tsuboi, H. Kashima, Masashi Sugiyama, T. Kanamori","doi":"10.1109/ICDM.2008.49","DOIUrl":"https://doi.org/10.1109/ICDM.2008.49","url":null,"abstract":"We propose a new statistical approach to the problem of inlier-based outlier detection, i.e.,finding outliers in the test set based on the training set consisting only of inliers. Our key idea is to use the ratio of training and test data densities as an outlier score; we estimate the ratio directly in a semi-parametric fashion without going through density estimation. Thus our approach is expected to have better performance in high-dimensional problems. Furthermore, the applied algorithm for density ratio estimation is equipped with a natural cross-validation procedure, allowing us to objectively optimize the value of tuning parameters such as the regularization parameter and the kernel width. The algorithm offers a closed-form solution as well as a closed-form formula for the leave-one-out error. Thanks to this, the proposed outlier detection method is computationally very efficient and is scalable to massive datasets. Simulations with benchmark and real-world datasets illustrate the usefulness of the proposed approach.","PeriodicalId":252958,"journal":{"name":"2008 Eighth IEEE International Conference on Data Mining","volume":"140 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129054484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-label Classification Using Ensembles of Pruned Sets","authors":"J. Read, B. Pfahringer, G. Holmes","doi":"10.1109/ICDM.2008.74","DOIUrl":"https://doi.org/10.1109/ICDM.2008.74","url":null,"abstract":"This paper presents a pruned sets method (PS) for multi-label classification. It is centred on the concept of treating sets of labels as single labels. This allows the classification process to inherently take into account correlations between labels. By pruning these sets, PS focuses only on the most important correlations, which reduces complexity and improves accuracy. By combining pruned sets in an ensemble scheme (EPS), new label sets can be formed to adapt to irregular or complex data. The results from experimental evaluation on a variety of multi-label datasets show that [E]PS can achieve better performance and train much faster than other multi-label methods.","PeriodicalId":252958,"journal":{"name":"2008 Eighth IEEE International Conference on Data Mining","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126932485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
I. Assent, Ralph Krieger, Emmanuel Müller, T. Seidl
{"title":"INSCY: Indexing Subspace Clusters with In-Process-Removal of Redundancy","authors":"I. Assent, Ralph Krieger, Emmanuel Müller, T. Seidl","doi":"10.1109/ICDM.2008.46","DOIUrl":"https://doi.org/10.1109/ICDM.2008.46","url":null,"abstract":"Subspace clustering aims at detecting clusters in any subspace projection of a high dimensional space. As the number of projections is exponential in the number of dimensions, efficiency is crucial. Moreover, the resulting subspace clusters are often highly redundant, i.e. many clusters are detected multiply in several projections. We propose a novel index for efficient subspace clustering in a novel depth-first processing with in-process-removal of redundant clusters for better pruning. Thorough experiments on real and synthetic data show that INSCY yields substantial efficiency and quality improvements.","PeriodicalId":252958,"journal":{"name":"2008 Eighth IEEE International Conference on Data Mining","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126686728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Finding Good Itemsets by Packing Data","authors":"Nikolaj Tatti, Jilles Vreeken","doi":"10.1109/ICDM.2008.39","DOIUrl":"https://doi.org/10.1109/ICDM.2008.39","url":null,"abstract":"The problem of selecting small groups of itemsets that represent the data well has recently gained a lot of attention. We approach the problem by searching for the itemsets that compress the data efficiently. As a compression technique we use decision trees combined with a refined version of MDL. More formally, assuming that the items are ordered, we create a decision tree for each item that may only depend on the previous items. Our approach allows us to find complex interactions between the attributes, not just co-occurrences of 1s. Further, we present a link between the itemsets and the decision trees and use this link to export the itemsets from the decision trees. In this paper we present two algorithms. The first one is a simple greedy approach that builds a family of itemsets directly from data. The second one, given a collection of candidate itemsets, selects a small subset of these itemsets. Our experiments show that these approaches result in compact and high quality descriptions of the data.","PeriodicalId":252958,"journal":{"name":"2008 Eighth IEEE International Conference on Data Mining","volume":"385 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134210207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"M3MIML: A Maximum Margin Method for Multi-instance Multi-label Learning","authors":"Min-Ling Zhang, Zhi-Hua Zhou","doi":"10.1109/ICDM.2008.27","DOIUrl":"https://doi.org/10.1109/ICDM.2008.27","url":null,"abstract":"Multi-instance multi-label learning (MIML) deals with the problem where each training example is associated with not only multiple instances but also multiple class labels. Previous MIML algorithms work by identifying its equivalence in degenerated versions of multi-instance multi-label learning. However, useful information encoded in training examples may get lost during the identification process. In this paper, a maximum margin method is proposed for MIML which directly exploits the connections between instances and labels. The learning task is formulated as a quadratic programming (QP) problem and implemented in its dual form. Applications to scene classification and text categorization show that the proposed approach achieves superior performance over existing MIML methods.","PeriodicalId":252958,"journal":{"name":"2008 Eighth IEEE International Conference on Data Mining","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133158757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Finding Alternative Clusterings Using Constraints","authors":"I. Davidson, Zijie Qi","doi":"10.1109/ICDM.2008.141","DOIUrl":"https://doi.org/10.1109/ICDM.2008.141","url":null,"abstract":"The aim of data mining is to find novel and actionable insights. However, most algorithms typically just find a single explanation of the data even though alternatives could exist. In this work, we explore a general purpose approach to find an alternative clustering of the data with the aid of must-link and cannot-link constraints. This problem has received little attention in the literature and since our approach can be incorporated into the many clustering algorithms that use a distance function, compares favorably with existing work.","PeriodicalId":252958,"journal":{"name":"2008 Eighth IEEE International Conference on Data Mining","volume":"627 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133180760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Predicting Future Decision Trees from Evolving Data","authors":"Mirko Böttcher, M. Spott, R. Kruse","doi":"10.1109/ICDM.2008.90","DOIUrl":"https://doi.org/10.1109/ICDM.2008.90","url":null,"abstract":"Recognizing and analyzing change is an important human virtue because it enables us to anticipate future scenarios and thus allows us to act pro-actively. One approach to understand change within a domain is to analyze how models and patterns evolve. Knowing how a model changes over time is suggesting to ask: Can we use this knowledge to learn a model in anticipation, such that it better reflects the near-future characteristics of an evolving domain? In this paper we provide an answer to this question by presenting an algorithm which predicts future decision trees based on a model of change. In particular, this algorithm encompasses a novel approach to change mining which is based on analyzing the changes of the decisions made during model learning. The proposed approach can also be applied to other types of classifiers and thus provides a basis for future research. We present our first experimental results which show that anticipated decision trees have the potential to outperform trees learned on the most recent data.","PeriodicalId":252958,"journal":{"name":"2008 Eighth IEEE International Conference on Data Mining","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123261355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Fast Method to Mine Frequent Subsequences from Graph Sequence Data","authors":"Akihiro Inokuchi, T. Washio","doi":"10.1109/ICDM.2008.106","DOIUrl":"https://doi.org/10.1109/ICDM.2008.106","url":null,"abstract":"In recent years, the mining of a complete set of frequent subgraphs from labeled graph data has been extensively studied.However, to our best knowledge, almost no methods have been proposed to find frequent subsequences of graphs from a set of graph sequences. In this paper, we define a novel class of graph subsequences by introducing axiomatic rules of graph transformation, their admissibility constraints and a union graph. Then we propose an efficient approach named \"GTRACE'' to enumerate frequent transformation subsequences (FTSs) of graphs from a given set of graph sequences. Its fundamental performance has been evaluated by using artificial datasets, and its practicality has been confirmed through the experiments using real world datasets.","PeriodicalId":252958,"journal":{"name":"2008 Eighth IEEE International Conference on Data Mining","volume":"109 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121992134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}