Fourth IEEE International Conference on Data Mining (ICDM'04)最新文献

筛选
英文 中文
GREW - a scalable frequent subgraph discovery algorithm 一个可扩展的频繁子图发现算法
Fourth IEEE International Conference on Data Mining (ICDM'04) Pub Date : 2004-11-01 DOI: 10.1109/ICDM.2004.10024
Michihiro Kuramochi, G. Karypis
{"title":"GREW - a scalable frequent subgraph discovery algorithm","authors":"Michihiro Kuramochi, G. Karypis","doi":"10.1109/ICDM.2004.10024","DOIUrl":"https://doi.org/10.1109/ICDM.2004.10024","url":null,"abstract":"Existing algorithms that mine graph datasets to discover patterns corresponding to frequently occurring subgraphs can operate efficiently on graphs that are sparse, contain a large number of relatively small connected components, have vertices with low and bounded degrees, and contain well-labeled vertices and edges. However, for graphs that do not share these characteristics, these algorithms become highly unscalable. In this paper we present a heuristic algorithm called GREW to overcome the limitations of existing complete or heuristic frequent subgraph discovery algorithms. GREW is designed to operate on a large graph and to find patterns corresponding to connected subgraphs that have a large number of vertex-disjoint embeddings. Our experimental evaluation shows that GREW is efficient, can scale to very large graphs, and find non-trivial patterns.","PeriodicalId":325511,"journal":{"name":"Fourth IEEE International Conference on Data Mining (ICDM'04)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121435403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 146
MMAC: a new multi-class, multi-label associative classification approach MMAC:一种新的多类、多标签关联分类方法
Fourth IEEE International Conference on Data Mining (ICDM'04) Pub Date : 2004-11-01 DOI: 10.1109/ICDM.2004.10117
F. Thabtah, P. Cowling, Yonghong Peng
{"title":"MMAC: a new multi-class, multi-label associative classification approach","authors":"F. Thabtah, P. Cowling, Yonghong Peng","doi":"10.1109/ICDM.2004.10117","DOIUrl":"https://doi.org/10.1109/ICDM.2004.10117","url":null,"abstract":"Building fast and accurate classifiers for large-scale databases is an important task in data mining. There is growing evidence that integrating classification and association rule mining together can produce more efficient and accurate classifiers than traditional classification techniques. In this paper, the problem of producing rules with multiple labels is investigated. We propose a new associative classification approach called multi-class, multi-label associative classification (MMAC). This paper also presents three measures for evaluating the accuracy of data mining classification approaches to a wide range of traditional and multi-label classification problems. Results for 28 different datasets show that the MMAC approach is an accurate and effective classification technique, highly competitive and scalable in comparison with other classification approaches.","PeriodicalId":325511,"journal":{"name":"Fourth IEEE International Conference on Data Mining (ICDM'04)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133449658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 262
Detecting patterns of appliances from total load data using a dynamic programming approach 使用动态规划方法从总负载数据中检测设备的模式
Fourth IEEE International Conference on Data Mining (ICDM'04) Pub Date : 2004-11-01 DOI: 10.1109/ICDM.2004.10003
M. Baranski, J. Voss
{"title":"Detecting patterns of appliances from total load data using a dynamic programming approach","authors":"M. Baranski, J. Voss","doi":"10.1109/ICDM.2004.10003","DOIUrl":"https://doi.org/10.1109/ICDM.2004.10003","url":null,"abstract":"Nonintrusive appliance load monitoring (NIALM) systems require sufficient accurate total load data to separate the load into its major appliances. The most available solutions separate the whole electric energy consumption based on the measurement of all three voltages and currents. Aside from the cost for special measuring devices, the intrusion into the local installation is the main problem for reaching a high market distribution. The use of standard digital electricity meters could avoid this problem but the loss of information of the measured data has to be compensated by more intelligent algorithms and implemented rules to disaggregate the total load trace of only the active power measurements. The paper presents a NIALM approach to analyse data, collected from a standard digital electricity meter. To disaggregate the consumption of the entire active power into its major electrical end uses, an algorithm consisting of clustering methods, a genetic algorithm and a dynamic programming approach is presented. The genetic algorithm is used to combine frequently occurring events to create hypothetical finite state machines to model detectable appliances. The time series of each finite state machine is optimized using a dynamic programming method similar to the viterbi algorithm.","PeriodicalId":325511,"journal":{"name":"Fourth IEEE International Conference on Data Mining (ICDM'04)","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130998931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 71
Using emerging patterns and decision trees in rare-class classification 利用新兴模式和决策树进行稀有类分类
Fourth IEEE International Conference on Data Mining (ICDM'04) Pub Date : 2004-11-01 DOI: 10.1109/ICDM.2004.10058
Hamad Alhammady, K. Ramamohanarao
{"title":"Using emerging patterns and decision trees in rare-class classification","authors":"Hamad Alhammady, K. Ramamohanarao","doi":"10.1109/ICDM.2004.10058","DOIUrl":"https://doi.org/10.1109/ICDM.2004.10058","url":null,"abstract":"The problem of classifying rarely occurring cases is faced in many real life applications. The scarcity of the rare cases makes it difficult to classify them correctly using traditional classifiers. In this paper, we propose an approach to use emerging patterns (EPs) (G. Dong and J. Li, 1999) and decision trees (DTs) in rare-class classification (EPDT). EPs are those itemsets whose supports in one class are significantly higher than their supports in the other classes. EPDT employs the power of EPs to improve the quality of rare-case classification. To achieve this aim, we first introduce the idea of generating nonexisting rare-class instances, and then we over-sample the most important rare-class instances. Our experiments show that EPDT outperforms many classification methods.","PeriodicalId":325511,"journal":{"name":"Fourth IEEE International Conference on Data Mining (ICDM'04)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130028854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 53
Query-driven support pattern discovery for classification learning 用于分类学习的查询驱动支持模式发现
Fourth IEEE International Conference on Data Mining (ICDM'04) Pub Date : 2004-11-01 DOI: 10.1109/ICDM.2004.10032
Yiqiu Han, Wai Lam
{"title":"Query-driven support pattern discovery for classification learning","authors":"Yiqiu Han, Wai Lam","doi":"10.1109/ICDM.2004.10032","DOIUrl":"https://doi.org/10.1109/ICDM.2004.10032","url":null,"abstract":"We propose a query-driven lazy learning algorithm which attempts to discover useful local patterns, called support patterns, for classifying a given query. The learning is customized to the query to avoid the horizon effect. We show that this query-driven learning algorithm can guarantee to discover all support patterns with perfect expected accuracy in polynomial time. The experimental results on benchmark data sets also demonstrate that our learning algorithm really has prominent learning performance.","PeriodicalId":325511,"journal":{"name":"Fourth IEEE International Conference on Data Mining (ICDM'04)","volume":"132 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133877283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Estimation of false negatives in classification 分类中假阴性的估计
Fourth IEEE International Conference on Data Mining (ICDM'04) Pub Date : 2004-11-01 DOI: 10.1109/ICDM.2004.10048
Sandeep Mane, J. Srivastava, San-Yih Hwang, J. Vayghan
{"title":"Estimation of false negatives in classification","authors":"Sandeep Mane, J. Srivastava, San-Yih Hwang, J. Vayghan","doi":"10.1109/ICDM.2004.10048","DOIUrl":"https://doi.org/10.1109/ICDM.2004.10048","url":null,"abstract":"In many classification problems such as spam detection and network intrusion, a large number of unlabeled test instances are predicted negative by the classifier However, the high costs as well as time constraints on an expert's time prevent further analysis of the \"predicted false\" class instances in order to segregate the false negatives from the true negatives. A systematic method is thus required to obtain an estimate of the number of false negatives. A capture-recapture based method can be used to obtain an ML-estimate of false negatives when two or more independent classifiers are available. In the case for which independence does not hold, we can apply log-linear models to obtain an estimate of false negatives. However, as shown in this paper, lesser the dependencies among the classifiers, better is the estimate obtained for false negatives. Thus, ideally independent classifiers should be used to estimate the false negatives in an unlabeled dataset. Experimental results on the spam dataset from the UCI machine learning repository are presented.","PeriodicalId":325511,"journal":{"name":"Fourth IEEE International Conference on Data Mining (ICDM'04)","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116606327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Analysis of consensus partition in cluster ensemble 聚类集成中一致性划分的分析
Fourth IEEE International Conference on Data Mining (ICDM'04) Pub Date : 2004-11-01 DOI: 10.1109/ICDM.2004.10100
A. Topchy, Martin H. C. Law, Anil K. Jain, A. Fred
{"title":"Analysis of consensus partition in cluster ensemble","authors":"A. Topchy, Martin H. C. Law, Anil K. Jain, A. Fred","doi":"10.1109/ICDM.2004.10100","DOIUrl":"https://doi.org/10.1109/ICDM.2004.10100","url":null,"abstract":"In combination of multiple partitions, one is usually interested in deriving a consensus solution with a quality better than that of given partitions. Several recent studies have empirically demonstrated improved accuracy of clustering ensembles on a number of artificial and real-world data sets. Unlike certain multiple supervised classifier systems, convergence properties of unsupervised clustering ensembles remain unknown for conventional combination schemes. In this paper, we present formal arguments on the effectiveness of cluster ensemble from two perspectives. The first is based on a stochastic partition generation model related to re-labeling and consensus function with plurality voting. The second is to study the property of the \"mean\" partition of an ensemble with respect to a metric on the space of all possible partitions. In both the cases, the consensus solution can be shown to converge to a true underlying clustering solution as the number of partitions in the ensemble increases. This paper provides a rigorous justification for the use of cluster ensemble.","PeriodicalId":325511,"journal":{"name":"Fourth IEEE International Conference on Data Mining (ICDM'04)","volume":"53 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129399597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 175
An evaluation of approaches to classification rule selection 分类规则选择方法的评价
Fourth IEEE International Conference on Data Mining (ICDM'04) Pub Date : 2004-11-01 DOI: 10.1109/ICDM.2004.10012
Frans Coenen, P. Leng
{"title":"An evaluation of approaches to classification rule selection","authors":"Frans Coenen, P. Leng","doi":"10.1109/ICDM.2004.10012","DOIUrl":"https://doi.org/10.1109/ICDM.2004.10012","url":null,"abstract":"In this paper a number of classification rule evaluation measures are considered. In particular the authors review the use of a variety of selection techniques used to order classification rules contained in a classifier, and a number of mechanisms used to classify unseen data. The authors demonstrate that rule ordering founded on the size of antecedent works well given certain conditions.","PeriodicalId":325511,"journal":{"name":"Fourth IEEE International Conference on Data Mining (ICDM'04)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129853556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 48
An adaptive learning approach for noisy data streams 噪声数据流的自适应学习方法
Fourth IEEE International Conference on Data Mining (ICDM'04) Pub Date : 2004-11-01 DOI: 10.1109/ICDM.2004.10049
F. Chu, Yizhou Wang, C. Zaniolo
{"title":"An adaptive learning approach for noisy data streams","authors":"F. Chu, Yizhou Wang, C. Zaniolo","doi":"10.1109/ICDM.2004.10049","DOIUrl":"https://doi.org/10.1109/ICDM.2004.10049","url":null,"abstract":"Two critical challenges typically associated with mining data streams are concept drift and data contamination. To address these challenges, we seek learning techniques and models that are robust to noise and can adapt to changes in timely fashion. We approach the stream-mining problem using a statistical estimation framework, and propose a fast and robust discriminative model for learning noisy data streams. We build an ensemble of classifiers to achieve timely adaptation by weighting classifiers in a way that maximizes the likelihood of the data. We further employ robust statistical techniques to alleviate the problem of noise sensitivity. Experimental results on both synthetic and real-life data sets demonstrate the effectiveness of this model learning approach.","PeriodicalId":325511,"journal":{"name":"Fourth IEEE International Conference on Data Mining (ICDM'04)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125323666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 58
SUMMARY: efficiently summarizing transactions for clustering 摘要:有效地总结事务群集
Fourth IEEE International Conference on Data Mining (ICDM'04) Pub Date : 2004-11-01 DOI: 10.1109/ICDM.2004.10105
Jianyong Wang, G. Karypis
{"title":"SUMMARY: efficiently summarizing transactions for clustering","authors":"Jianyong Wang, G. Karypis","doi":"10.1109/ICDM.2004.10105","DOIUrl":"https://doi.org/10.1109/ICDM.2004.10105","url":null,"abstract":"Frequent itemset mining was initially proposed and has been studied extensively in the context of association rule mining. In recent years, several studies have also extended its application to the transaction (or document) classification and clustering. However, most of the frequent-itemset based clustering algorithms need to first mine a large intermediate set of frequent itemsets in order to identify a subset of the most promising ones that can be used for clustering. In this paper, we study how to directly find a subset of high quality frequent itemsets that can be used as a concise summary of the transaction database and to cluster the categorical data. By exploring some properties of the subset of itemsets that we are interested in, we proposed several search space pruning methods and designed an efficient algorithm called SUMMARY. Our empirical results have shown that SUMMARY runs very fast even when the minimum support is extremely low and scales very well with respect to the database size, and surprisingly, as a pure frequent itemset mining algorithm, it is very effective in clustering the categorical data and summarizing the dense transaction databases.","PeriodicalId":325511,"journal":{"name":"Fourth IEEE International Conference on Data Mining (ICDM'04)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124296746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信