2002 IEEE International Conference on Data Mining, 2002. Proceedings.最新文献

筛选
英文 中文
Improving medical/biological data classification performance by wavelet preprocessing 小波预处理提高医学/生物数据分类性能
2002 IEEE International Conference on Data Mining, 2002. Proceedings. Pub Date : 2002-12-09 DOI: 10.1109/ICDM.2002.1184022
Qi Li, Tao Li, Shenghuo Zhu, C. Kambhamettu
{"title":"Improving medical/biological data classification performance by wavelet preprocessing","authors":"Qi Li, Tao Li, Shenghuo Zhu, C. Kambhamettu","doi":"10.1109/ICDM.2002.1184022","DOIUrl":"https://doi.org/10.1109/ICDM.2002.1184022","url":null,"abstract":"Many real-world datasets contain noise which could degrade the performances of learning algorithms. Motivated from the success of wavelet denoising techniques in image data, we explore a general solution to alleviate the effect of noisy data by wavelet preprocessing for medical/biological data classification. Our experiments are divided into two categories: one is of different classification algorithms on a specific database, and the other is of a specific classification algorithm (decision tree) on different databases. The experiment results show that the wavelet denoising of noisy data is able to improve the accuracies of those classification methods, if the localities of the attributes are strong enough.","PeriodicalId":405340,"journal":{"name":"2002 IEEE International Conference on Data Mining, 2002. Proceedings.","volume":"28 1-2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120923507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
A comparison study on algorithms for incremental update of frequent sequences 频繁序列增量更新算法的比较研究
2002 IEEE International Conference on Data Mining, 2002. Proceedings. Pub Date : 2002-12-09 DOI: 10.1109/ICDM.2002.1184001
Minghua Zhang, B. Kao, Chi Lap Yip
{"title":"A comparison study on algorithms for incremental update of frequent sequences","authors":"Minghua Zhang, B. Kao, Chi Lap Yip","doi":"10.1109/ICDM.2002.1184001","DOIUrl":"https://doi.org/10.1109/ICDM.2002.1184001","url":null,"abstract":"The problem of mining frequent sequences is to extract frequently occurring subsequences in a sequence database. Algorithms on this mining problem include GSP, MFS, and SPADE. The problem of incremental update of frequent sequences is to keep track of the set of frequent sequences as the underlying database changes. Previous studies have extended the traditional algorithms to efficiently solve the update problem. These incremental algorithms include ISM, GSP+ and MFS+. Each incremental algorithm has its own characteristics and they have been studied and evaluated separately under different scenarios. This paper presents a comprehensive study on the relative performance of the incremental algorithms as well as their non-incremental counterparts. Our goal is to provide guidelines on the choice of an algorithm for solving the incremental update problem given the various characteristics of a sequence database.","PeriodicalId":405340,"journal":{"name":"2002 IEEE International Conference on Data Mining, 2002. Proceedings.","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123008621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Solving the fragmentation problem of decision trees by discovering boundary emerging patterns 通过发现边界涌现模式来解决决策树的碎片化问题
2002 IEEE International Conference on Data Mining, 2002. Proceedings. Pub Date : 2002-12-09 DOI: 10.1109/ICDM.2002.1184021
Jinyan Li, L. Wong
{"title":"Solving the fragmentation problem of decision trees by discovering boundary emerging patterns","authors":"Jinyan Li, L. Wong","doi":"10.1109/ICDM.2002.1184021","DOIUrl":"https://doi.org/10.1109/ICDM.2002.1184021","url":null,"abstract":"The single coverage constraint discourages a decision tree to contain many significant rules. The loss of significant rules leads to a loss in accuracy. On the other hand, the fragmentation problem causes a decision tree to contain too many minor rules. The presence of minor rules decreases the accuracy. We propose to use emerging patterns to solve these problems. In our approach, many globally significant rules can be discovered. Extensive expert. mental results on gene expression datasets show that our approach are more accurate than single C4.5 trees, and are also better than bagged or boosted C4.5 trees.","PeriodicalId":405340,"journal":{"name":"2002 IEEE International Conference on Data Mining, 2002. Proceedings.","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124682146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Mixtures of ARMA models for model-based time series clustering 基于模型的时间序列聚类混合ARMA模型
2002 IEEE International Conference on Data Mining, 2002. Proceedings. Pub Date : 2002-12-09 DOI: 10.1109/ICDM.2002.1184037
Yimin Xiong, D. Yeung
{"title":"Mixtures of ARMA models for model-based time series clustering","authors":"Yimin Xiong, D. Yeung","doi":"10.1109/ICDM.2002.1184037","DOIUrl":"https://doi.org/10.1109/ICDM.2002.1184037","url":null,"abstract":"Clustering problems are central to many knowledge discovery and data mining tasks. However, most existing clustering methods can only work with fixed-dimensional representations of data patterns. In this paper we study the clustering of data patterns that are represented as sequences or time series possibly of different lengths. We propose a model-based approach to this problem using mixtures of autoregressive moving average (ARMA) models. We derive an expectation-maximization (EM) algorithm for learning the mixing coefficients as well as the parameters of component models. Experiments were conducted on simulated and real datasets. Results show that our method compares favorably with another method recently proposed by others for similar time series clustering problems.","PeriodicalId":405340,"journal":{"name":"2002 IEEE International Conference on Data Mining, 2002. Proceedings.","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116326642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 135
Implementation of a least fixpoint operator for fast mining of relational databases 关系型数据库快速挖掘的最小不动点算子实现
2002 IEEE International Conference on Data Mining, 2002. Proceedings. Pub Date : 2002-12-09 DOI: 10.1109/ICDM.2002.1184016
H. Jamil
{"title":"Implementation of a least fixpoint operator for fast mining of relational databases","authors":"H. Jamil","doi":"10.1109/ICDM.2002.1184016","DOIUrl":"https://doi.org/10.1109/ICDM.2002.1184016","url":null,"abstract":"Recent research has focused on computing large item sets for association rule mining using SQL3 least fixpoint computation, and by exploiting the monotonic nature of the SQL3 aggregate functions such as sum and create view recursive constructs. Such approaches allow us to view mining as an ad hoc querying exercise and treat the efficiency issue as an optimization problem. We present a recursive implementation of a recently proposed least fixpoint operator for computing large item sets from object-relational databases. We present experimental evidence to show that our implementation compares well with several well-regarded and contemporary algorithms for large item set generation.","PeriodicalId":405340,"journal":{"name":"2002 IEEE International Conference on Data Mining, 2002. Proceedings.","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133937625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adapting information extraction knowledge for unseen Web sites 为不可见的网站调整信息提取知识
2002 IEEE International Conference on Data Mining, 2002. Proceedings. Pub Date : 2002-12-09 DOI: 10.1109/ICDM.2002.1183995
Tak-Lam Wong, Wai Lam
{"title":"Adapting information extraction knowledge for unseen Web sites","authors":"Tak-Lam Wong, Wai Lam","doi":"10.1109/ICDM.2002.1183995","DOIUrl":"https://doi.org/10.1109/ICDM.2002.1183995","url":null,"abstract":"We propose a wrapper adaptation framework which aims at adapting a learned wrapper to an unseen Web site. It significantly reduces human effort in constructing wrappers. Our framework makes use of extraction rules previously discovered from a particular site to seek potential training example candidates for an unseen site. Rule generalization and text categorization are employed for finding suitable example candidates. Another feature of our approach is that it makes use of the previously discovered lexicon to classify good training examples automatically for the new site. We conducted extensive experiments to evaluate the quality of the extraction performance and the adaptability of our approach.","PeriodicalId":405340,"journal":{"name":"2002 IEEE International Conference on Data Mining, 2002. Proceedings.","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132787976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
A new implementation technique for fast spectral based document retrieval systems 基于谱的快速文档检索系统的一种新的实现技术
2002 IEEE International Conference on Data Mining, 2002. Proceedings. Pub Date : 2002-12-09 DOI: 10.1109/ICDM.2002.1183922
L. Park, M. Palaniswami, K. Ramamohanarao
{"title":"A new implementation technique for fast spectral based document retrieval systems","authors":"L. Park, M. Palaniswami, K. Ramamohanarao","doi":"10.1109/ICDM.2002.1183922","DOIUrl":"https://doi.org/10.1109/ICDM.2002.1183922","url":null,"abstract":"The traditional methods of spectral text retrieval (FDS,CDS) create an index of spatial data and convert the data to its spectral form at query time. We present a new method of implementing and querying an index containing spectral data which will conserve the high precision performance of the spectral methods, reduce the time needed to resolve the query, and maintain an acceptable size for the index. This is done by taking advantage of the properties of the discrete cosine transform and by applying ideas from vector space document ranking methods.","PeriodicalId":405340,"journal":{"name":"2002 IEEE International Conference on Data Mining, 2002. Proceedings.","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133205085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
On computing condensed frequent pattern bases 压缩频繁模式基的计算
2002 IEEE International Conference on Data Mining, 2002. Proceedings. Pub Date : 2002-12-09 DOI: 10.1109/ICDM.2002.1183928
J. Pei, Guozhu Dong, Wei Zou, Jiawei Han
{"title":"On computing condensed frequent pattern bases","authors":"J. Pei, Guozhu Dong, Wei Zou, Jiawei Han","doi":"10.1109/ICDM.2002.1183928","DOIUrl":"https://doi.org/10.1109/ICDM.2002.1183928","url":null,"abstract":"Frequent pattern mining has been studied extensively. However, the effectiveness and efficiency of this mining is often limited, since the number of frequent patterns generated is often too large. In many applications it is sufficient to generate and examine only frequent patterns with support frequency in close-enough approximation instead of in full precision. Such a compact but close-enough frequent pattern base is called a condensed frequent patterns-base. In this paper we propose and examine several alternatives at the design, representation, and implementation of such condensed frequent pattern-bases. A few algorithms for computing such pattern-bases are proposed. Their effectiveness at pattern compression and their efficient computation methods are investigated. A systematic performance study is conducted on different kinds of databases, which demonstrates the effectiveness and efficiency of our approach at handling frequent pattern mining in large databases.","PeriodicalId":405340,"journal":{"name":"2002 IEEE International Conference on Data Mining, 2002. Proceedings.","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115701046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 71
Neighborgram clustering. Interactive exploration of cluster neighborhoods Neighborgram集群。集群社区的互动探索
2002 IEEE International Conference on Data Mining, 2002. Proceedings. Pub Date : 2002-12-09 DOI: 10.1109/ICDM.2002.1184004
M. Berthold, Bernd Wiswedel, D. E. Patterson
{"title":"Neighborgram clustering. Interactive exploration of cluster neighborhoods","authors":"M. Berthold, Bernd Wiswedel, D. E. Patterson","doi":"10.1109/ICDM.2002.1184004","DOIUrl":"https://doi.org/10.1109/ICDM.2002.1184004","url":null,"abstract":"We describe an interactive way to generate a set of clusters for a given data set. The clustering is done by constructing local histograms, which can then be used to visualize, select, and fine-tune potential cluster candidates. The accompanying algorithm can also generate clusters automatically, allowing for an automatic or semi-automatic clustering process where the user only occasionally interacts with the algorithm. We illustrate the ability to automatically identify and visualize clusters using NCI's AIDS Antiviral Screen data set.","PeriodicalId":405340,"journal":{"name":"2002 IEEE International Conference on Data Mining, 2002. Proceedings.","volume":"170 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124135264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
On active learning for data acquisition 关于数据获取的主动学习
2002 IEEE International Conference on Data Mining, 2002. Proceedings. Pub Date : 2002-12-09 DOI: 10.1109/ICDM.2002.1184002
Zhiqiang Zheng, B. Padmanabhan
{"title":"On active learning for data acquisition","authors":"Zhiqiang Zheng, B. Padmanabhan","doi":"10.1109/ICDM.2002.1184002","DOIUrl":"https://doi.org/10.1109/ICDM.2002.1184002","url":null,"abstract":"Many applications are characterized by having naturally incomplete data on customers - where data on only some fixed set of local variables is gathered However, having a more complete picture can help build better models. The naive solution to this problem - acquiring complete data for all customers s often impractical due to the costs of doing so. A possible alternative is to acquire complete data for \"some\" customers and to use this to improve the models built. The data acquisition problem is determining how many, and which, customers to acquire additional data from. In this paper we suggest using active learning based approaches for the data acquisition problem. In particular, we present initial methods for data acquisition and evaluate these methods experimentally on web usage data and UCI datasets. Results show that the methods perform well and indicate that active learning based methods for data acquisition can be a promising area for data mining research.","PeriodicalId":405340,"journal":{"name":"2002 IEEE International Conference on Data Mining, 2002. Proceedings.","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125904472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 74
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信