Alexessander Alves, Rui Camacho, Eugénio C. Oliveira
{"title":"Discovery of functional relationships in multi-relational data using inductive logic programming","authors":"Alexessander Alves, Rui Camacho, Eugénio C. Oliveira","doi":"10.1109/ICDM.2004.10053","DOIUrl":"https://doi.org/10.1109/ICDM.2004.10053","url":null,"abstract":"ILP systems have been largely applied to data mining classification tasks with a considerable success. The use of ILP systems in regression tasks has been far less successful. Current systems have very limited numerical reasoning capabilities, which limits the application of ILP to discovery of functional relationships of numeric nature. This paper proposes improvements in numerical reasoning capabilities of ILP systems for dealing with regression tasks. It proposes the use of statistical-based techniques like model validation and model selection to improve noise handling and it introduces a search stopping criterium based on the PAC method to evaluate learning performance. We have found these extensions essential to improve on results over machine learning and statistical-based algorithms used in the empirical evaluation study.","PeriodicalId":325511,"journal":{"name":"Fourth IEEE International Conference on Data Mining (ICDM'04)","volume":"140 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121794917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Prem Melville, M. Saar-Tsechansky, F. Provost, R. Mooney
{"title":"Active feature-value acquisition for classifier induction","authors":"Prem Melville, M. Saar-Tsechansky, F. Provost, R. Mooney","doi":"10.1109/ICDM.2004.10075","DOIUrl":"https://doi.org/10.1109/ICDM.2004.10075","url":null,"abstract":"Many induction problems include missing data that can be acquired at a cost. For building accurate predictive models, acquiring complete information for all instances is often expensive or unnecessary, while acquiring information for a random subset of instances may not be most effective. Active feature-value acquisition tries to reduce the cost of achieving a desired model accuracy by identifying instances for which obtaining complete information is most informative. We present an approach in which instances are selected for acquisition based on the current model's accuracy and its confidence in the prediction. Experimental results demonstrate that our approach can induce accurate models using substantially fewer feature-value acquisitions as compared to alternative policies.","PeriodicalId":325511,"journal":{"name":"Fourth IEEE International Conference on Data Mining (ICDM'04)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126158581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Integrating multi-objective genetic algorithms into clustering for fuzzy association rules mining","authors":"Mehmet Kaya, R. Alhajj","doi":"10.1109/ICDM.2004.10050","DOIUrl":"https://doi.org/10.1109/ICDM.2004.10050","url":null,"abstract":"In this paper, we propose an automated method to decide on the number of fuzzy sets and for the autonomous mining of both fuzzy sets and fuzzy association rules. We compare the proposed multiobjective GA based approach with: 1) CURE based approach; 2) Chien et al. (2001) clustering approach. Experimental results on JOOK transactions extracted from the adult data of United States census in year 2000 show that the proposed method exhibits good performance over the other two approaches in terms of runtime, number of large itemsets and number of association rules.","PeriodicalId":325511,"journal":{"name":"Fourth IEEE International Conference on Data Mining (ICDM'04)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125344249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On closed constrained frequent pattern mining","authors":"F. Bonchi, C. Lucchese","doi":"10.1109/ICDM.2004.10093","DOIUrl":"https://doi.org/10.1109/ICDM.2004.10093","url":null,"abstract":"Constrained frequent patterns and closed frequent patterns are two paradigms aimed at reducing the set of extracted patterns to a smaller, more interesting, subset. Although a lot of work has been done with both these paradigms, there is still confusion around the mining problem obtained by joining closed and constrained frequent patterns in a unique framework. In this paper, we shed light on this problem by providing a formal definition and a thorough characterization. We also study computational issues and show how to combine the most recent results in both paradigms, providing a very efficient algorithm which exploits the two requirements (satisfying constraints and being closed) together at mining time in order to reduce the computation as much as possible.","PeriodicalId":325511,"journal":{"name":"Fourth IEEE International Conference on Data Mining (ICDM'04)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126628644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-view clustering","authors":"S. Bickel, T. Scheffer","doi":"10.1109/ICDM.2004.10095","DOIUrl":"https://doi.org/10.1109/ICDM.2004.10095","url":null,"abstract":"We consider clustering problems in which the available attributes can be split into two independent subsets, such that either subset suffices for learning. Example applications of this multi-view setting include clustering of Web pages which have an intrinsic view (the pages themselves) and an extrinsic view (e.g., anchor texts of inbound hyperlinks); multi-view learning has so far been studied in the context of classification. We develop and study partitioning and agglomerative, hierarchical multi-view clustering algorithms for text data. We find empirically that the multi-view versions of k-means and EM greatly improve on their single-view counterparts. By contrast, we obtain negative results for agglomerative hierarchical multi-view clustering. Our analysis explains this surprising phenomenon.","PeriodicalId":325511,"journal":{"name":"Fourth IEEE International Conference on Data Mining (ICDM'04)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121564914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Filling-in missing objects in orders","authors":"Toshihiro Kamishima, S. Akaho","doi":"10.1109/ICDM.2004.10047","DOIUrl":"https://doi.org/10.1109/ICDM.2004.10047","url":null,"abstract":"Filling-in techniques are important, since missing values frequently appear in real data. Such techniques have been established for categorical or numerical values. Though lists of ordered objects are widely used as representational forms (e.g., Web search results, best-seller lists), filling-in techniques for orders have received little attention. We therefore propose a simple but effective technique to fill-in missing objects in orders. We built this technique into our collaborative filtering system.","PeriodicalId":325511,"journal":{"name":"Fourth IEEE International Conference on Data Mining (ICDM'04)","volume":"172 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124188676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Decision tree evolution using limited number of labeled data items from drifting data streams","authors":"W. Fan, Yi-an Huang, Philip S. Yu","doi":"10.1109/ICDM.2004.10026","DOIUrl":"https://doi.org/10.1109/ICDM.2004.10026","url":null,"abstract":"Most previously proposed mining methods on data streams make an unrealistic assumption that \"labelled\" data stream is readily available and can be mined at anytime. However, in most real-world problems, labelled data streams are rarely immediately available. Due to this reason, models are reconstructed only when labelled data become available periodically. This passive stream mining model has several drawbacks. We propose a concept of demand-driven active data mining. In active mining, the loss of the model is either continuously guessed without using any true class labels or estimated, whenever necessary, from a small number of instances whose actual class labels are verified by paying an affordable cost. When the estimated loss is more than a tolerable threshold, the model evolves by using a small number of instances with verified true class labels. Previous work on active mining concentrates on error guess and estimation. In this paper, we discuss several approaches on decision tree evolution.","PeriodicalId":325511,"journal":{"name":"Fourth IEEE International Conference on Data Mining (ICDM'04)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128070575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Density connected clustering with local subspace preferences","authors":"C. Böhm, K. Murthy, H. Kriegel, Peer Kröger","doi":"10.1109/ICDM.2004.10087","DOIUrl":"https://doi.org/10.1109/ICDM.2004.10087","url":null,"abstract":"Many clustering algorithms tend to break down in high-dimensional feature spaces, because the clusters often exist only in specific subspaces (attribute subsets) of the original feature space. Therefore, the task of projected clustering (or subspace clustering) has been defined recently. As a solution to tackle this problem, we propose the concept of local subspace preferences, which captures the main directions of high point density. Using this concept, we adopt density-based clustering to cope with high-dimensional data. In particular, we achieve the following advantages over existing approaches: Our proposed method has a determinate result, does not depend on the order of processing, is robust against noise, performs only one single scan over the database, and is linear in the number of dimensions. A broad experimental evaluation shows that our approach yields results of significantly better quality than recent work on clustering high-dimensional data.","PeriodicalId":325511,"journal":{"name":"Fourth IEEE International Conference on Data Mining (ICDM'04)","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128145868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning weighted naive Bayes with accurate ranking","authors":"Harry Zhang, Shengli Sheng","doi":"10.1109/ICDM.2004.10030","DOIUrl":"https://doi.org/10.1109/ICDM.2004.10030","url":null,"abstract":"Naive Bayes is one of most effective classification algorithms. In many applications, however, a ranking of examples are more desirable than just classification. How to extend naive Bayes to improve its ranking performance is an interesting and useful question in practice. Weighted naive Bayes is an extension of naive Bayes, in which attributes have different weights. This paper investigates how to learn a weighted naive Bayes with accurate ranking from data, or more precisely, how to learn the weights of a weighted naive Bayes to produce accurate ranking. We explore various methods: the gain ratio method, the hill climbing method, and the Markov chain Monte Carlo method, the hill climbing method combined with the gain ratio method, and the Markov chain Monte Carlo method combined with the gain ratio method. Our experiments show that a weighted naive Bayes trained to produce accurate ranking outperforms naive Bayes.","PeriodicalId":325511,"journal":{"name":"Fourth IEEE International Conference on Data Mining (ICDM'04)","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127942218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Divide and prosper: comparing models of customer behavior from populations to individuals","authors":"Tianyi Jiang, A. Tuzhilin","doi":"10.1109/ICDM.2004.10013","DOIUrl":"https://doi.org/10.1109/ICDM.2004.10013","url":null,"abstract":"This paper compares customer segmentation, 1-to-1, and aggregate marketing approaches across a broad range of experimental settings, including multiple segmentation levels, marketing datasets, dependent variables, and different types of classifiers, segmentation techniques, and predictive measures. Our experimental results show that, overall, 1-to-1 modeling significantly outperforms the aggregate approach among high-volume customers and is never worse than aggregate approach among low-volume customers. Moreover, the best segmentation techniques tend to outperform 1-to-l modeling among low-volume customers.","PeriodicalId":325511,"journal":{"name":"Fourth IEEE International Conference on Data Mining (ICDM'04)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131013036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}