2002 IEEE International Conference on Data Mining, 2002. Proceedings.最新文献

筛选
英文 中文
Predicting rare events in temporal domains 预测时间域的罕见事件
2002 IEEE International Conference on Data Mining, 2002. Proceedings. Pub Date : 2002-12-09 DOI: 10.1109/ICDM.2002.1183991
R. Vilalta, Sheng Ma
{"title":"Predicting rare events in temporal domains","authors":"R. Vilalta, Sheng Ma","doi":"10.1109/ICDM.2002.1183991","DOIUrl":"https://doi.org/10.1109/ICDM.2002.1183991","url":null,"abstract":"Temporal data mining aims at finding patterns in historical data. Our work proposes an approach to extract temporal patterns from data to predict the occurrence of target events, such as computer attacks on host networks, or fraudulent transactions in financial institutions. Our problem formulation exhibits two major challenges: 1) we assume events being characterized by categorical features and displaying uneven inter-arrival times; such an assumption falls outside the scope of classical time-series analysis, 2) we assume target events are highly infrequent; predictive techniques must deal with the class-imbalance problem. We propose an efficient algorithm that tackles the challenges above by transforming the event prediction problem into a search for all frequent eventsets preceding target events. The class imbalance problem is overcome by a search for patterns on the minority class exclusively; the discrimination power of patterns is then validated against other classes. Patterns are then combined into a rule-based model for prediction. Our experimental analysis indicates the types of event sequences where target events can be accurately predicted.","PeriodicalId":405340,"journal":{"name":"2002 IEEE International Conference on Data Mining, 2002. Proceedings.","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126964067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 177
Reviewing RELIEF and its extensions: a new approach for estimating attributes considering high-correlated features 回顾RELIEF及其扩展:一种考虑高相关特征的估计属性的新方法
2002 IEEE International Conference on Data Mining, 2002. Proceedings. Pub Date : 2002-12-09 DOI: 10.1109/ICDM.2002.1184009
R. López
{"title":"Reviewing RELIEF and its extensions: a new approach for estimating attributes considering high-correlated features","authors":"R. López","doi":"10.1109/ICDM.2002.1184009","DOIUrl":"https://doi.org/10.1109/ICDM.2002.1184009","url":null,"abstract":"RELIEF algorithm and its extensions are some of the most known filter methods for estimating the quality of attributes in classification problems dealing with both dependent and independent features. These methods attend to find all meaningful features for each problem (both weakly and strongly ones) so they are usually employed like a first stage for detecting irrelevant attributes. Nevertheless, in this paper we checked that RELIEF-family algorithms present some important limitations that could distort the selection of the final features' subset, specially in the presence of high-correlated attributes. To overcome these difficulties, a new approach has been developed (WACSA algorithm), which performance and validity are verified on wellknown data sets.","PeriodicalId":405340,"journal":{"name":"2002 IEEE International Conference on Data Mining, 2002. Proceedings.","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124468188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Intersection based generalization rules for the analysis of symbolic septic shock patient data 符号感染性休克患者数据分析的交叉点概化规则
2002 IEEE International Conference on Data Mining, 2002. Proceedings. Pub Date : 2002-12-09 DOI: 10.1109/ICDM.2002.1184026
J. Paetz
{"title":"Intersection based generalization rules for the analysis of symbolic septic shock patient data","authors":"J. Paetz","doi":"10.1109/ICDM.2002.1184026","DOIUrl":"https://doi.org/10.1109/ICDM.2002.1184026","url":null,"abstract":"In intensive care units much data is irregularly recorded. Here, we consider the analysis of symbolic septic shock patient data. We show that it could be worth considering the generalization paradigm (individual cases generalized to more general rules) instead of the association paradigm (combining single attributes) when considering very individual cases (e.g. patients) and when expecting longer rules than shorter ones. We present an algorithm for rule generation and classification based on heuristically generated set-based intersections. We demonstrate the usefulness of our algorithm by analysing our septic shock patient data.","PeriodicalId":405340,"journal":{"name":"2002 IEEE International Conference on Data Mining, 2002. Proceedings.","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124547781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Employing discrete Bayes error rate for discretization and feature selection tasks 采用离散贝叶斯误差率进行离散化和特征选择任务
2002 IEEE International Conference on Data Mining, 2002. Proceedings. Pub Date : 2002-12-09 DOI: 10.1109/ICDM.2002.1183916
A. Mittal, L. Cheong
{"title":"Employing discrete Bayes error rate for discretization and feature selection tasks","authors":"A. Mittal, L. Cheong","doi":"10.1109/ICDM.2002.1183916","DOIUrl":"https://doi.org/10.1109/ICDM.2002.1183916","url":null,"abstract":"The tasks of discretization and feature selection are frequently used to improve classification accuracy. We use discrete approximation of Bayes error rate to perform discretization on the features. The discretization procedure targets minimization of Bayes error rate within each partition. A class-pair discriminatory measure can be defined on discretized partitions which forms the basis of the feature selection algorithm. A small value of this measure for a class-pair indicates that the class-pair in consideration is confusing and the features which distinguish them well should be chosen first. A video classification problem on a large database is considered for showing the comparison of a classifier using our discretization and feature selection tasks with SVM, neural network classifier, decision trees and K-nearest neighbor classifier.","PeriodicalId":405340,"journal":{"name":"2002 IEEE International Conference on Data Mining, 2002. Proceedings.","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114330456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Using category-based adherence to cluster market-basket data 使用基于类别的遵守集群市场篮子数据
2002 IEEE International Conference on Data Mining, 2002. Proceedings. Pub Date : 2002-12-09 DOI: 10.1109/ICDM.2002.1184000
Ching-Huang Yun, Kun-Ta Chuang, Ming-Syan Chen
{"title":"Using category-based adherence to cluster market-basket data","authors":"Ching-Huang Yun, Kun-Ta Chuang, Ming-Syan Chen","doi":"10.1109/ICDM.2002.1184000","DOIUrl":"https://doi.org/10.1109/ICDM.2002.1184000","url":null,"abstract":"We devise an efficient algorithm for clustering market-basket data. Different from those of the traditional data, the features of market-basket data are known to be of high dimensionality, sparsity, and with massive outliers. Without explicitly considering the presence of the taxonomy, most prior efforts on clustering market-basket data can be viewed as dealing with items in the leaf level of the taxonomy tree. Clustering transactions across different levels of the taxonomy is of great importance for marketing strategies as well as for the result representation of the clustering techniques for market-basket data. In view of the features of market-basket data, we devise a measurement, called the category-based adherence, and utilize this measurement to perform the clustering. The distance of an item to a given cluster is defined as the number of links between this item and its nearest large node in the taxonomy tree where a large node is an item or a category node whose occurrence count exceeds a given threshold. The category-based adherence of a transaction to a cluster is then defined as the average distance of the items in this transaction to that cluster With this category-based adherence measurement, we develop an efficient clustering algorithm, called algorithm CBA, for market-basket data with the objective to minimize the category-based adherence. A validation model based on information gain is also devised to assess the quality of clustering for market-basket data. As validated by both real and synthetic datasets, it is shown by our experimental results, with the taxonomy information, algorithm CBA significantly outperforms the prior works in both the execution efficiency and the clustering quality for market-basket data.","PeriodicalId":405340,"journal":{"name":"2002 IEEE International Conference on Data Mining, 2002. Proceedings.","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122001149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
A hybrid approach to discover Bayesian networks from databases using evolutionary programming 利用进化编程从数据库中发现贝叶斯网络的混合方法
2002 IEEE International Conference on Data Mining, 2002. Proceedings. Pub Date : 2002-12-09 DOI: 10.1109/ICDM.2002.1183994
M. Wong, Shing Yan Lee, K. Leung
{"title":"A hybrid approach to discover Bayesian networks from databases using evolutionary programming","authors":"M. Wong, Shing Yan Lee, K. Leung","doi":"10.1109/ICDM.2002.1183994","DOIUrl":"https://doi.org/10.1109/ICDM.2002.1183994","url":null,"abstract":"Describes a data mining approach that employs evolutionary programming to discover knowledge represented in Bayesian networks. There are two different approaches to the network learning problem. The first one uses dependency analysis, while the second one searches good network structures according to a metric. Unfortunately, both approaches have their own drawbacks. Thus, we propose a hybrid algorithm of the two approaches, which consists of two phases, namely, the conditional independence test and the search phases. A new operator is introduced to further enhance the search efficiency. We conduct a number of experiments and compare the hybrid algorithm with our previous algorithm, MDLEP, which uses EP for network learning. The empirical results illustrate that the new approach has better performance. We apply the approach to data sets of direct marketing and compare the performance of the evolved Bayesian networks obtained by the new algorithm with the models generated by other methods. In the comparison, the induced Bayesian networks produced by the new algorithm outperform the other models.","PeriodicalId":405340,"journal":{"name":"2002 IEEE International Conference on Data Mining, 2002. Proceedings.","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127474166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
From path tree to frequent patterns: a framework for mining frequent patterns 从路径树到频繁模式:一个用于挖掘频繁模式的框架
2002 IEEE International Conference on Data Mining, 2002. Proceedings. Pub Date : 2002-12-09 DOI: 10.1109/ICDM.2002.1183996
Yabo Xu, J. Yu, Guimei Liu, Hongjun Lu
{"title":"From path tree to frequent patterns: a framework for mining frequent patterns","authors":"Yabo Xu, J. Yu, Guimei Liu, Hongjun Lu","doi":"10.1109/ICDM.2002.1183996","DOIUrl":"https://doi.org/10.1109/ICDM.2002.1183996","url":null,"abstract":"We propose a framework for mining frequent patterns from large transactional databases. The core of the framework is a coded prefix-path tree with two representations, namely, a memory-based prefix-path tree and a disk-based prefix-path tree. The disk-based prefix-path tree is simple in its data structure yet rich in information contained, and is small in size. The memory-based prefix-path tree is simple and compact. Based on the memory-based prefix-path tree, a new depth-first frequent pattern discovery algorithm, called PP-Mine, is proposed that outperforms FP-growth significantly. The memory-based prefix-path tree can be stored on disk using a disk-based prefix-path tree with assistance of the new coding scheme. We present loading algorithms to load the minimal required disk-based prefix-path tree into main memory. Our technique is to push constraints into the loading process, which has not been well studied yet.","PeriodicalId":405340,"journal":{"name":"2002 IEEE International Conference on Data Mining, 2002. Proceedings.","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127497119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
High performance data mining using the nearest neighbor join 使用最近邻连接的高性能数据挖掘
2002 IEEE International Conference on Data Mining, 2002. Proceedings. Pub Date : 2002-12-09 DOI: 10.1109/ICDM.2002.1183884
C. Böhm, Florian Krebs
{"title":"High performance data mining using the nearest neighbor join","authors":"C. Böhm, Florian Krebs","doi":"10.1109/ICDM.2002.1183884","DOIUrl":"https://doi.org/10.1109/ICDM.2002.1183884","url":null,"abstract":"The similarity join has become an important database primitive to support similarity search and data mining. A similarity join combines two sets of complex objects such that the result contains all pairs of similar objects. Well-known are two types of the similarity join, the distance range join where the user defines a distance threshold for the join, and the closest point query or k-distance join which retrieves the k most similar pairs. In this paper, we investigate an important, third similarity join operation called k-nearest neighbor join which combines each point Of one point set with its k nearest neighbors in the other set. It has been shown that many standard algorithms of Knowledge Discovery in Databases (KDD) such as k-means and k-medoid clustering, nearest neighbor classification, data cleansing, postprocessing of sampling-based data mining etc. can be implemented on top of the k-nn join operation to achieve performance improvements without affecting the quality of the result of these algorithms. We propose a new algorithm to compute the k-nearest neighbor join using the multipage index (MuX), a specialized index structure for the similarity join. To reduce both CPU and I/O cost, we develop optimal loading and processing strategies.","PeriodicalId":405340,"journal":{"name":"2002 IEEE International Conference on Data Mining, 2002. Proceedings.","volume":"4021 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127540562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 46
Adaptive ripple down rules method based on minimum description length principle 基于最小描述长度原理的自适应纹波下降规则方法
2002 IEEE International Conference on Data Mining, 2002. Proceedings. Pub Date : 2002-12-09 DOI: 10.1109/ICDM.2002.1183998
Tetsuya Yoshida, Takuya Wada, H. Motoda, T. Washio
{"title":"Adaptive ripple down rules method based on minimum description length principle","authors":"Tetsuya Yoshida, Takuya Wada, H. Motoda, T. Washio","doi":"10.1109/ICDM.2002.1183998","DOIUrl":"https://doi.org/10.1109/ICDM.2002.1183998","url":null,"abstract":"When class distribution changes, some pieces of knowledge previously acquired become worthless, and the existence of such knowledge may hinder acquisition of new knowledge. The paper proposes an adaptive ripple down rules (RDR) method based on the minimum description length principle aiming at knowledge acquisition in a dynamically changing environment. To cope with the change of class distribution, knowledge deletion is carried out as well as knowledge acquisition so that useless knowledge is properly discarded. To cope with the change of the source of knowledge, RDR knowledge based systems can be constructed adaptively by acquiring knowledge from both domain experts and data. By incorporating inductive learning methods, knowledge acquisition can be carried out even when only either data or experts are available by switching the source of knowledge from domain experts to data and vice versa at any time of knowledge acquisition. Since experts need not be available all the time, it contributes to reducing the cost of personnel expenses. Experiments were conducted by simulating the change of the source of knowledge and the change of class distribution using the datasets in UCI repository. The results are encouraging.","PeriodicalId":405340,"journal":{"name":"2002 IEEE International Conference on Data Mining, 2002. Proceedings.","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127742588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Mining surveillance video for independent motion detection 矿井监控视频的独立运动检测
2002 IEEE International Conference on Data Mining, 2002. Proceedings. Pub Date : 2002-12-09 DOI: 10.1109/ICDM.2002.1184043
Zhongfei Zhang
{"title":"Mining surveillance video for independent motion detection","authors":"Zhongfei Zhang","doi":"10.1109/ICDM.2002.1184043","DOIUrl":"https://doi.org/10.1109/ICDM.2002.1184043","url":null,"abstract":"This paper addresses the special applications of data mining techniques in homeland defense. The problem targeted, which is frequently encountered in military/intelligence surveillance, is to mine a massive surveillance video database automatically collected to retrieve the shots containing independently moving targets. A novel solution to this problem is presented in this paper, which offers a completely qualitative approach to solving for the automatic independent motion detection problem directly from the compressed surveillance video in a faster than real-time mining performance. This approach is based on the linear system consistency analysis, and consequently is called QLS. Since the QLS approach only focuses on what exactly is necessary to compute a solution, it saves the computation to a minimum and achieves the efficacy to the maximum. Evaluations from real data show that QLS delivers effective mining performance at the achieved efficiency.","PeriodicalId":405340,"journal":{"name":"2002 IEEE International Conference on Data Mining, 2002. Proceedings.","volume":"289 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115430166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信