{"title":"A Logical Formulation of the Granular Data Model","authors":"T. Fan, C. Liau, T. Lin, Karen Lee","doi":"10.1504/IJGCRSIS.2010.029583","DOIUrl":"https://doi.org/10.1504/IJGCRSIS.2010.029583","url":null,"abstract":"In data mining problems, data is usually provided in the form of data tables. To represent knowledge discovered from data tables, decision logic (DL) is proposed in rough set theory. While DL is an instance of propositional logic, we can also describe data tables by other logical formalisms. In this paper, we use a kind of many-sorted logic, called attribute value-sorted logic, to study association rule mining from the perspective of granular computing. By using a logical formulation, it is easy to show that patterns are properties of classes of isomorphic data tables. We also show that a granular data model can act as a canonical model of a class of isomorphic data tables. Consequently, association rule mining can be restricted to such granular data models.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131057659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Appice, A. Ciampi, A. Lanza, D. Malerba, Antonella Rapolla, Luisa Vetturi
{"title":"Geographic Knowledge Discovery in INGENS: An Inductive Database Perspective","authors":"A. Appice, A. Ciampi, A. Lanza, D. Malerba, Antonella Rapolla, Luisa Vetturi","doi":"10.1109/ICDMW.2008.120","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.120","url":null,"abstract":"INGENS is a prototype of GIS which integrates a geographic knowledge discovery engine to mine several kinds of spatial KDD objects from the topographic maps stored in a spatial database. In this paper we describe the main principles of an inductive spatial database in INGENS. Inductive database allows to keep permanent KDD objects and integrate database technology with systems for the geographic knowledge generation. In contrast to traditional spatial database technology, inductive database allows to answer queries which require synthesizing and applying plausible knowledge which is generated by (inductive) inference from both spatial objects and KDD objects (prior knowledge) stored in the same database.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125937241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Domain Driven Data Mining (D3M)","authors":"Longbing Cao","doi":"10.1109/ICDMW.2008.98","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.98","url":null,"abstract":"In deploying data mining into the real-world business, we have to cater for business scenarios, organizational factors, user preferences and business needs. However, the current data mining algorithms and tools often stop at the delivery of patterns satisfying expected technical interestingness. Business people are not informed about how and what to do to take over the technical deliverables. The gap between academia and business has seriously affected the widespread employment of advanced data mining techniques in greatly promoting enterprise operational quality and productivity. To narrow down the gap, cater for realworld factors relevant to data mining, and make data mining workable in supporting decision-making actions in the real world, we propose the methodology of domain driven data mining (D3M for short). D3M aims to construct next-generation methodologies, techniques and tools for a possible paradigm shift from data-centered hidden pattern mining to domain-driven actionable knowledge delivery. In this talk, we address the concept map of D3M, theoretical underpinnings, several general and flexible frameworks, research issues, possible directions, application areas etc. related to D3M. Real-world case studies in financial data mining and social security mining are demonstrated to show the effectiveness and applicability of D3M in both research and development of real-world challenging problems.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"121 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130028739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A KDD Platform Based on the Application Service Provider Paradigm","authors":"Fabio Fumarola, E. Salvemini, D. Malerba","doi":"10.1109/ICDMW.2008.100","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.100","url":null,"abstract":"Nowadays, small and medium enterprises (SMEs) are forced to compete on a global market and to make strategic decisions in short periods of time. In order to allow SMEs access to information technologies which can support their competition on a global scale, public administrations are fostering the setting up of digital districts. In this paper, we describe a distributed collaborative data mining platform, named KD-ASP, developed for a digital district. It is based on the application service provider (ASP) paradigm, which allows SMEs accessing to data mining services over a network and to cut down costs for their acquisition, upgrading and maintenance. KD-ASP allows the users to collaborate on the design of a knowledge discovery process whose execution is then demanded to a workflow engine. Tasks involved in a process are classified as data selection, pre-processing, data transformation, data mining and data visualization, and are made available as Web services.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128870759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Behavior Informatics and Analytics: Let Behavior Talk","authors":"Longbing Cao","doi":"10.1109/ICDMW.2008.95","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.95","url":null,"abstract":"Behavior is increasingly recognized as a key component in business intelligence and problem-solving. Different from traditional behavior analysis, which mainly focus on implicit behavior and explicit business appearance as a result of business usage and customer demographics, this paper proposes the field of Behavior Informatics and Analytics (BIA), to support explicit behavior involvement through a conversion from transactional data to behavioral data, and further genuine analysis of native behavior patterns and impacts. BIA consists of key components including behavior modeling and representation, behavioral data construction, behavior impact modeling, behavior pattern analysis, and behavior presentation. BIA can greatly complement the existing means for combined, more informative and social patterns and solutions for critical problem-solving in areas such as dealing with customer-officer interaction, counter-terrorism and monitoring online communities.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"335 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121685221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Parallel Hierarchical Clustering on Market Basket Data","authors":"Baoying Wang, Qin Ding, Imad Rahal","doi":"10.1109/ICDMW.2008.32","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.32","url":null,"abstract":"Data clustering has been proven to be a promising data mining technique. Recently, there have been many attempts for clustering market-basket data. In this paper, we propose a parallelized hierarchical clustering approach on market-basket data (PH-Clustering), which is implemented using MPI. Based on the analysis of the major clustering steps, we adopt a partial local and partial global approach to decrease the computation time meanwhile keeping communication time at minimum. Load balance issue is always considered especially at data partitioning stage. Our experimental results demonstrate that PH-Clustering speeds up the sequential clustering with a great magnitude. The larger the data size, the more significant the speedup when the number of processors is large. Our results also show that the number of items has more impact on the performance of PH-Clustering than the number of transactions.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122712193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"RE-SPaM: Using Regular Expressions for Sequential Pattern Mining in Trajectory Databases","authors":"Leticia I. Gómez, A. Vaisman","doi":"10.1109/ICDMW.2008.14","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.14","url":null,"abstract":"In sequential pattern mining, languages based on regular expressions (RE) were proposed to restrict frequent sequences to the ones that satisfy user-specified constraints. In these languages, REs are applied over items. We propose a much powerful language, based on regular expressions, denoted RE-SPaM, where the basic elements are constraints over the attributes of the items. Expressions in this language may include attributes, functions over attributes, and variables. We present the data model, sketch the syntax and semantics of RE-SPaM, a set of examples, and suggest how it can be used in the mining process.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117051849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Online Reliability Estimates for Individual Predictions in Data Streams","authors":"P. Rodrigues, João Gama, Z. Bosnić","doi":"10.1109/ICDMW.2008.123","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.123","url":null,"abstract":"Several predictive systems are nowadays vital for operations and decision support. The quality of these systems is most of the time defined by their average accuracy which has low or no information at all about the estimated error of each individual prediction. In many sensitive applications, users should be allowed to associate a measure of reliability to each prediction. In the case of batch systems, reliability measures have already been defined, mostly empirical measures as the estimation using the local sensitivity analysis. However, with the advent of data streams, these reliability estimates should also be computed online, based only on available data and current model's state. In this paper we define empirical measures to perform online estimation of reliability of individual predictions when made in the context of online learning systems. We present preliminary results and evaluate the estimators in two different problems.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132144432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ZCS Revisited: Zeroth-Level Classifier Systems for Data Mining","authors":"F. Tzima, P. Mitkas","doi":"10.1109/ICDMW.2008.83","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.83","url":null,"abstract":"Learning classifier systems (LCS) are machine learning systems designed to work for both multi-step and single-step decision tasks. The latter case presents an interesting,though not widely studied, challenge for such algorithms,especially when they are applied to real-world data mining problems. The present investigation departs from the popular approach of applying accuracy-based LCS to data mining problems and aims to uncover the potential of strength-based LCS in such tasks. In this direction, ZCS-DM, a Zeroth-level Classifier System for data mining, is applied to a series of real-world classification problems and its performance is compared to that of other state-of-the-art machine learning techniques (C4.5, HIDER and XCS). Results are encouraging, since with only a modest parameter exploration phase, ZCS-DM manages to outperform its rival algorithms in eleven out of the twelve benchmark datasets used in this study. We conclude this work by identifying future research directions.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"136 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127354214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Extraction of Discriminative Features from Hyperspectral Data","authors":"H. Kalkan, Y. Yardimci","doi":"10.1109/ICDMW.2008.40","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.40","url":null,"abstract":"This paper presents a method to discover the discriminative patterns or features in hyperspectral data for classification. The proposed method searches the data space along both spectral and spatial frequency axis and combines the adjacent spectral and spatial frequency bands so that a simpler but more effective feature set is achieved. The algorithm is tested on hyperspectral images of hazelnut kernels. The detected features were evaluated for classifying contaminated and uncontaminated hazelnut kernels. The developed algorithm is adaptive, robust and can be applicable to other type of hyperspectral data.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130478118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}