2002 IEEE International Conference on Data Mining, 2002. Proceedings.最新文献_第9页

Predicting rare events in temporal domains 预测时间域的罕见事件

2002 IEEE International Conference on Data Mining, 2002. Proceedings. Pub Date : 2002-12-09 DOI: 10.1109/ICDM.2002.1183991

R. Vilalta, Sheng Ma

引用次数: 177

Reviewing RELIEF and its extensions: a new approach for estimating attributes considering high-correlated features 回顾RELIEF及其扩展:一种考虑高相关特征的估计属性的新方法

2002 IEEE International Conference on Data Mining, 2002. Proceedings. Pub Date : 2002-12-09 DOI: 10.1109/ICDM.2002.1184009

R. López

引用次数: 7

Intersection based generalization rules for the analysis of symbolic septic shock patient data 符号感染性休克患者数据分析的交叉点概化规则

2002 IEEE International Conference on Data Mining, 2002. Proceedings. Pub Date : 2002-12-09 DOI: 10.1109/ICDM.2002.1184026

J. Paetz

引用次数: 10

Employing discrete Bayes error rate for discretization and feature selection tasks 采用离散贝叶斯误差率进行离散化和特征选择任务

2002 IEEE International Conference on Data Mining, 2002. Proceedings. Pub Date : 2002-12-09 DOI: 10.1109/ICDM.2002.1183916

A. Mittal, L. Cheong

引用次数: 7

Using category-based adherence to cluster market-basket data 使用基于类别的遵守集群市场篮子数据

2002 IEEE International Conference on Data Mining, 2002. Proceedings. Pub Date : 2002-12-09 DOI: 10.1109/ICDM.2002.1184000

Ching-Huang Yun, Kun-Ta Chuang, Ming-Syan Chen

{"title":"Using category-based adherence to cluster market-basket data","authors":"Ching-Huang Yun, Kun-Ta Chuang, Ming-Syan Chen","doi":"10.1109/ICDM.2002.1184000","DOIUrl":"https://doi.org/10.1109/ICDM.2002.1184000","url":null,"abstract":"We devise an efficient algorithm for clustering market-basket data. Different from those of the traditional data, the features of market-basket data are known to be of high dimensionality, sparsity, and with massive outliers. Without explicitly considering the presence of the taxonomy, most prior efforts on clustering market-basket data can be viewed as dealing with items in the leaf level of the taxonomy tree. Clustering transactions across different levels of the taxonomy is of great importance for marketing strategies as well as for the result representation of the clustering techniques for market-basket data. In view of the features of market-basket data, we devise a measurement, called the category-based adherence, and utilize this measurement to perform the clustering. The distance of an item to a given cluster is defined as the number of links between this item and its nearest large node in the taxonomy tree where a large node is an item or a category node whose occurrence count exceeds a given threshold. The category-based adherence of a transaction to a cluster is then defined as the average distance of the items in this transaction to that cluster With this category-based adherence measurement, we develop an efficient clustering algorithm, called algorithm CBA, for market-basket data with the objective to minimize the category-based adherence. A validation model based on information gain is also devised to assess the quality of clustering for market-basket data. As validated by both real and synthetic datasets, it is shown by our experimental results, with the taxonomy information, algorithm CBA significantly outperforms the prior works in both the execution efficiency and the clustering quality for market-basket data.","PeriodicalId":405340,"journal":{"name":"2002 IEEE International Conference on Data Mining, 2002. Proceedings.","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122001149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

A hybrid approach to discover Bayesian networks from databases using evolutionary programming 利用进化编程从数据库中发现贝叶斯网络的混合方法

2002 IEEE International Conference on Data Mining, 2002. Proceedings. Pub Date : 2002-12-09 DOI: 10.1109/ICDM.2002.1183994

M. Wong, Shing Yan Lee, K. Leung

引用次数: 24

From path tree to frequent patterns: a framework for mining frequent patterns 从路径树到频繁模式:一个用于挖掘频繁模式的框架

2002 IEEE International Conference on Data Mining, 2002. Proceedings. Pub Date : 2002-12-09 DOI: 10.1109/ICDM.2002.1183996

Yabo Xu, J. Yu, Guimei Liu, Hongjun Lu

引用次数: 23

High performance data mining using the nearest neighbor join 使用最近邻连接的高性能数据挖掘

2002 IEEE International Conference on Data Mining, 2002. Proceedings. Pub Date : 2002-12-09 DOI: 10.1109/ICDM.2002.1183884

C. Böhm, Florian Krebs

{"title":"High performance data mining using the nearest neighbor join","authors":"C. Böhm, Florian Krebs","doi":"10.1109/ICDM.2002.1183884","DOIUrl":"https://doi.org/10.1109/ICDM.2002.1183884","url":null,"abstract":"The similarity join has become an important database primitive to support similarity search and data mining. A similarity join combines two sets of complex objects such that the result contains all pairs of similar objects. Well-known are two types of the similarity join, the distance range join where the user defines a distance threshold for the join, and the closest point query or k-distance join which retrieves the k most similar pairs. In this paper, we investigate an important, third similarity join operation called k-nearest neighbor join which combines each point Of one point set with its k nearest neighbors in the other set. It has been shown that many standard algorithms of Knowledge Discovery in Databases (KDD) such as k-means and k-medoid clustering, nearest neighbor classification, data cleansing, postprocessing of sampling-based data mining etc. can be implemented on top of the k-nn join operation to achieve performance improvements without affecting the quality of the result of these algorithms. We propose a new algorithm to compute the k-nearest neighbor join using the multipage index (MuX), a specialized index structure for the similarity join. To reduce both CPU and I/O cost, we develop optimal loading and processing strategies.","PeriodicalId":405340,"journal":{"name":"2002 IEEE International Conference on Data Mining, 2002. Proceedings.","volume":"4021 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127540562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 46

Adaptive ripple down rules method based on minimum description length principle 基于最小描述长度原理的自适应纹波下降规则方法

2002 IEEE International Conference on Data Mining, 2002. Proceedings. Pub Date : 2002-12-09 DOI: 10.1109/ICDM.2002.1183998

Tetsuya Yoshida, Takuya Wada, H. Motoda, T. Washio

{"title":"Adaptive ripple down rules method based on minimum description length principle","authors":"Tetsuya Yoshida, Takuya Wada, H. Motoda, T. Washio","doi":"10.1109/ICDM.2002.1183998","DOIUrl":"https://doi.org/10.1109/ICDM.2002.1183998","url":null,"abstract":"When class distribution changes, some pieces of knowledge previously acquired become worthless, and the existence of such knowledge may hinder acquisition of new knowledge. The paper proposes an adaptive ripple down rules (RDR) method based on the minimum description length principle aiming at knowledge acquisition in a dynamically changing environment. To cope with the change of class distribution, knowledge deletion is carried out as well as knowledge acquisition so that useless knowledge is properly discarded. To cope with the change of the source of knowledge, RDR knowledge based systems can be constructed adaptively by acquiring knowledge from both domain experts and data. By incorporating inductive learning methods, knowledge acquisition can be carried out even when only either data or experts are available by switching the source of knowledge from domain experts to data and vice versa at any time of knowledge acquisition. Since experts need not be available all the time, it contributes to reducing the cost of personnel expenses. Experiments were conducted by simulating the change of the source of knowledge and the change of class distribution using the datasets in UCI repository. The results are encouraging.","PeriodicalId":405340,"journal":{"name":"2002 IEEE International Conference on Data Mining, 2002. Proceedings.","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127742588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Mining surveillance video for independent motion detection 矿井监控视频的独立运动检测

2002 IEEE International Conference on Data Mining, 2002. Proceedings. Pub Date : 2002-12-09 DOI: 10.1109/ICDM.2002.1184043

Zhongfei Zhang

引用次数: 13