{"title":"Association Action Rules","authors":"Z. Ras, A. Dardzinska, Li-Shiang Tsay, H. Wasyluk","doi":"10.1109/ICDMW.2008.66","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.66","url":null,"abstract":"Action rules describe possible transitions of objects from one state to another with respect to a distinguished attribute. Previous research on action rule discovery usually required the extraction of classification rules before constructing any action rule. This paper gives anew approach for generating association-type action rules. The notion of frequent action sets and Apriori-like strategy generating them is proposed. We introduce the notion of a representative action rules and give an algorithm to construct them directly from frequent action sets. Finally, we introduce the notion of a simple association action rule, the cost of association action rule, and give a strategy to construct simple association action rules of a lowest cost.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116763055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T. Korting, Leila Maria Garcia Fonseca, M. Escada, F. C. Silva, M. Silva
{"title":"GeoDMA - A Novel System for Spatial Data Mining","authors":"T. Korting, Leila Maria Garcia Fonseca, M. Escada, F. C. Silva, M. Silva","doi":"10.1109/ICDMW.2008.22","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.22","url":null,"abstract":"Although a huge amount of remote sensing data has been provided by Earth observation satellites, few data manipulation techniques and information extraction in large data sets have been developed. In this context, the present paper aims to show a new system for spatial data mining, and two test cases applied to land use change in the Brazilian Amazon region. We present the operational environment named GeoDMA, developed to implement such approach.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123415316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Discovering Triggering Events from Longitudinal Data","authors":"Corrado Loglisci, D. Malerba","doi":"10.1109/ICDMW.2008.136","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.136","url":null,"abstract":"Longitudinal data consist of the repeated measurements of some variables which describe the dynamics of a domain(process or phenomenon) over time. They can be analyzed in order to explain what event may cause the transition from a state into the next one during the evolution of the domain. Generally, approaches to this explanation problem rely on the exclusive usage of domain knowledge, while an analysis driven from only data is still lacking. In this paper we describe a data mining approach to discover events which may have triggered a transition during the evolution of the domain. The original data mining task is decomposed into two consecutive subtasks. First, the sequence of discrete states which represents the dynamics of the domain is determined. Second, the triggering events for two successive states are found out. Computational solutions to both problems are presented. Their application to two real scenarios is presented and results are discussed.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"33 1-2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123610641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Keyword Extraction Based on Lexical Chains and Word Co-occurrence for Chinese News Web Pages","authors":"Xinghua Li, Xindong Wu, Xuegang Hu, Fei Xie, Zhaozhong Jiang","doi":"10.1109/ICDMW.2008.122","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.122","url":null,"abstract":"This paper presents a new keyword extraction algorithm for Chinese news Web pages using lexical chains and word co-occurrence combined with frequency features, cohesion features, and corelation features. A lexical chain is an external performance consistency by semantically related words of a text, and is the representation of the semantic content of a portion of the text. Word co-occurrence distribution is an important statistical model widely used in natural language processing that reflects the correlation of the words. Lexical chains and word co-occurrence are combined in this paper to extract keywords for Chinese news Web pages in our proposed algorithm KELCC. This algorithm is not domain-specific and can be applied to a single Web page without corpus. Experiments on randomly selected Web pages have been performed to demonstrate the quality of the keywords extracted by our proposed algorithm.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128673840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yosr Naïja, Salem Chakhar, Kaouther Blibech Sinaoui, R. Robbana
{"title":"Extension of Partitional Clustering Methods for Handling Mixed Data","authors":"Yosr Naïja, Salem Chakhar, Kaouther Blibech Sinaoui, R. Robbana","doi":"10.1109/ICDMW.2008.85","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.85","url":null,"abstract":"Clustering is an active research topic in data mining and different methods have been proposed in the literature. Most of these methods are based on the use of a distance measure defined either on numerical attributes or on categorical attributes. However, in fields such as road traffic and medicine, datasets are composed of numerical and categorical attributes. Recently, there have been several proposals to develop clustering methods that support mixed attributes. There are three basic categories of clustering methods: partitional methods, hierarchical methods and density-based methods. This paper proposes an extension of partitional clustering methods devoted to mixed attributes. The proposed extension looks to create several partitions by using numerical attributes-based clustering methods and then chooses the one that maximizes a measure---called ``homogeneity degree\"---of these partitions according to categorical attributes.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"240 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128621565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comparing Accuracies of Rule Evaluation Models to Determine Human Criteria on Evaluated Rule Sets","authors":"H. Abe, S. Tsumoto","doi":"10.1109/ICDMW.2008.49","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.49","url":null,"abstract":"In data mining post-processing, rule selection using objective rule evaluation indices is one of a useful method to find out valuable knowledge from mined patterns. However, the relationship between an index value and experts' criteria has never been clarified. In this study, we have compared the accuracies of classification learning algorithms for datasets with randomized class distributions and real human evaluations. As a method to determine the relationship, we used rule evaluation models, which are learned from a dataset consisting of objective rule evaluation indices and evaluation labels for each rule. Then, the results show that accuracies of classification learning algorithms with/without criteria of human experts are different on a balanced randomized class distribution. With regarding to the results, we can consider about a way to distinguish randomly evaluated rules using the accuracies of multiple learning algorithms.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116314998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daniel Barbará, C. Domeniconi, Zoran Duric, M. Filippone, Richard Mansfield, E. Lawson
{"title":"Detecting Suspicious Behavior in Surveillance Images","authors":"Daniel Barbará, C. Domeniconi, Zoran Duric, M. Filippone, Richard Mansfield, E. Lawson","doi":"10.1109/ICDMW.2008.36","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.36","url":null,"abstract":"We introduce a novel technique to detect anomalies in images. The notion of normalcy is given by a baseline of images, under the assumption that the majority of such images is normal. The key of our approach is a featureless probabilistic representation of images, based on the length of the codeword necessary to represent each image. Such codeword's lengths are then used for anomaly detection based on statistical testing. Our techniques were tested on synthetic and real data sets. The results show that our approach can achieve high true positive and low false positive rates.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131903246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comparing Reliability of Association Rules and OLAP Statistical Tests","authors":"Zhibo Chen, C. Ordonez, Kai Zhao","doi":"10.1109/ICDMW.2008.76","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.76","url":null,"abstract":"Association rules is a technique that can detect patterns within the items of a dataset. The constrained version applies several restrictions that reduces the number of rules and also helps improve performance. On the other hand, OLAP statistical tests is an integration of exploratory On-Line Analytical Processing techniques and statistical tests. It uses a different approach that make it more appropriate for continuous domains and is able to discover more informative patterns. In this article, we thoroughly compare the reliability of the results returned by both techniques by analyzing the metrics, such as confidence and p-value, by which these techniques are implemented in relation to the results that are generated. While these two techniques are different, we were able to bring both to level ground by extending association rules with pairing to discover more specific patterns and extending OLAP statistical tests with constraints to reduce the number of discovered patterns. We conducted our experiments on a real medical dataset and found that the extended OLAP statistical tests discovered more patterns, had comparable performance, and possessed higher reliability due to its strong statistical background.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125140474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Detecting and Tracking Spatio-temporal Clusters with Adaptive History Filtering","authors":"J. Rosswog, K. Ghose","doi":"10.1109/ICDMW.2008.93","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.93","url":null,"abstract":"This paper addresses the problem of detecting and tracking moving clusters in spatio-temporal data sets. Spatio-temporal data sets contain data elements that move in space over time. Traditional data clustering algorithms work well on static data sets that contain well separated clusters. When traditional techniques are applied to spatio-temporal data they breakdown when the moving data elements intersect the space occupied by elements from another cluster. The goal of this work is to improve the accuracy of traditional data clustering algorithms on spatio-temporal data sets. Many clustering algorithms create clusters based on the distance between the elements. We extend this distance measure to be a function of the position history of the elements. We show through a series of experiments that the use of the history based distance measures greatly improves the performance of existing data clustering algorithms on spatio-temporal data sets. In random data sets we achieve up to a 90% improvement in cluster accuracy. To evaluate the clustering algorithms we created 102 spatio-temporal data sets. We also defined a set of metrics that are used to evaluate the performance of the clustering algorithms on the spatio-temporal data sets.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131200544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Discovery of Internal and External Hyperclique Patterns in Complex Graph Databases","authors":"Tsubasa Yamamoto, Tomonobu Ozaki, T. Ohkawa","doi":"10.1109/ICDMW.2008.59","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.59","url":null,"abstract":"In some applications, the whole structure of the target data can be represented naturally in \"multi-structured graphs\" that are complex graphs whose vertices consist of aset of structured data such as itemsets, sequences and so on. To catch the strong affinity relationship in multi-structured graphs, in this paper, we propose an algorithm named HFMG to discover novel and meaningful frequent patterns whose components are highly correlated with each other. HFMG mines two kinds of meaningful patterns efficiently according to which relationships we focus on. The effectiveness of the proposed algorithm is confirmed through the experiments with real and synthetic datasets.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129391626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}