2002 IEEE International Conference on Data Mining, 2002. Proceedings.最新文献

筛选
英文 中文
On the mining of substitution rules for statistically dependent items 统计相关项的替换规则挖掘
2002 IEEE International Conference on Data Mining, 2002. Proceedings. Pub Date : 2002-12-09 DOI: 10.1109/ICDM.2002.1183986
Wei-Guang Teng, M. Hsieh, Ming-Syan Chen
{"title":"On the mining of substitution rules for statistically dependent items","authors":"Wei-Guang Teng, M. Hsieh, Ming-Syan Chen","doi":"10.1109/ICDM.2002.1183986","DOIUrl":"https://doi.org/10.1109/ICDM.2002.1183986","url":null,"abstract":"In this paper a new mining capability, called mining of substitution rules, is explored. A substitution refers to the choice made by a customer to replace the purchase of items with that of others. The process of mining substitution rules can be decomposed into two procedures. The first identifies concrete itemsets among a large number of frequent itemsets, where a concrete itemset is a frequent itemset whose items are statistically dependent. The second is substitution rule generation. Two concrete itemsets X and Y form a substitution rule, denoted by X /spl utri/ Y to mean that X is a substitute for Y if and only if X and Y are negatively correlated and the negative association rule X /spl rarr/ Y~ exists. We derive theoretical properties for the model of substitution rule mining. Then, in light of these properties, the SRM algorithm (substitution rule mining) is designed and implemented to discover substitution rules efficiently while attaining good statistical significance. Empirical studies are performed to evaluate the performance of the SRM algorithm. It is shown that SRM produces substitution rules of very high quality.","PeriodicalId":405340,"journal":{"name":"2002 IEEE International Conference on Data Mining, 2002. Proceedings.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130156696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 67
Automatic web page classification in a dynamic and hierarchical way 自动网页分类在一个动态和分层的方式
2002 IEEE International Conference on Data Mining, 2002. Proceedings. Pub Date : 2002-12-09 DOI: 10.1109/ICDM.2002.1183930
Xiaogang Peng, Ben Choi
{"title":"Automatic web page classification in a dynamic and hierarchical way","authors":"Xiaogang Peng, Ben Choi","doi":"10.1109/ICDM.2002.1183930","DOIUrl":"https://doi.org/10.1109/ICDM.2002.1183930","url":null,"abstract":"Automatic classification of web pages is an effective way to deal with the difficulty of retrieving information from the Internet. Although there are many automatic classification algorithms and systems that have been proposed, most of them ignore the conflict between the fixed number of categories and the growing number of web pages going into the system. They also require searching through all existing categories to make any classification. We propose a dynamic and hierarchical classification system that is capable of adding new categories as required, organizing the web pages into a tree structure, and classifying web pages by searching through only one path of the tree structure. Our test results show that our proposed single-path search technique reduces the search complexity and increases the accuracy by 6% comparing to related algorithms. Our dynamic-category expansion technique also achieves satisfying results on adding new categories into our system as required.","PeriodicalId":405340,"journal":{"name":"2002 IEEE International Conference on Data Mining, 2002. Proceedings.","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128725242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 47
Mining genes in DNA using GeneScout 使用GeneScout挖掘DNA中的基因
2002 IEEE International Conference on Data Mining, 2002. Proceedings. Pub Date : 2002-12-09 DOI: 10.1109/ICDM.2002.1184041
M. M. Yin, J. Wang
{"title":"Mining genes in DNA using GeneScout","authors":"M. M. Yin, J. Wang","doi":"10.1109/ICDM.2002.1184041","DOIUrl":"https://doi.org/10.1109/ICDM.2002.1184041","url":null,"abstract":"In this paper we present a new system, called GeneScout, for predicting gene structures in vertebrate genomic DNA. The system contains specially designed hidden Markov models (HMMs) for detecting functional sites including protein-translation start sites, mRNA splicing junction donor and acceptor sites, etc. Our main hypothesis is that, given a vertebrate genomic DNA sequence S, it is always possible to construct a directed acyclic graph G such that the path for the actual coding region of S is in the set of all paths on G. Thus, the gene detection problem is reduced to that of analyzing the paths in the graph G. A dynamic programming algorithm is used to find the optimal path in G. The proposed system is trained using an expectation-maximization (EM) algorithm and its performance on vertebrate gene prediction is evaluated using the 10-way cross-validation method. Experimental results show the good performance of the proposed system and its complementarity to a widely used gene detection system.","PeriodicalId":405340,"journal":{"name":"2002 IEEE International Conference on Data Mining, 2002. Proceedings.","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126225099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Mining general temporal association rules for items with different exhibition periods 挖掘具有不同展期的物品的通用时间关联规则
2002 IEEE International Conference on Data Mining, 2002. Proceedings. Pub Date : 2002-12-09 DOI: 10.1109/ICDM.2002.1183886
Cheng-Yue Chang, Ming-Syan Chen, Chang-Hung Lee
{"title":"Mining general temporal association rules for items with different exhibition periods","authors":"Cheng-Yue Chang, Ming-Syan Chen, Chang-Hung Lee","doi":"10.1109/ICDM.2002.1183886","DOIUrl":"https://doi.org/10.1109/ICDM.2002.1183886","url":null,"abstract":"In this paper we explore a new model of mining general temporal association rules from large databases where the exhibition periods of the items are allowed to be different from one to another. Note that in this new model, the downward closure property which all prior Apriori-based algorithms relied upon to attain good efficiency is no longer valid. As a result, how to efficiently generate candidate itemsets form large databases has become the major challenge. To address this issue, we develop an efficient algorithm, referred to as algorithm SPF (standing for Segmented Progressive Filter) in this paper The basic idea behind SPF is to first segment the database into sub-databases in such a way that items in each sub-database will have either the common starting time or the common ending time. Then, for each sub-database, SPF progressively filters candidate 2-itemsets with cumulative filtering thresholds either forward or backward in time. This feature allows SPF of adopting the scan reduction technique by generating all candidate k-itemsets (k>2) from candidate 2-itemsets directly. The experimental results show that algorithm SPF significantly outperforms other schemes which are extended from prior methods in terms of the execution time and scalability.","PeriodicalId":405340,"journal":{"name":"2002 IEEE International Conference on Data Mining, 2002. Proceedings.","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124371809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 103
/spl Delta/B/sup +/ tree: indexing 3D point sets for pattern discovery /spl Delta/B/sup +/ tree:为模式发现索引3D点集
2002 IEEE International Conference on Data Mining, 2002. Proceedings. Pub Date : 2002-12-09 DOI: 10.1109/ICDM.2002.1184033
Xiong Wang
{"title":"/spl Delta/B/sup +/ tree: indexing 3D point sets for pattern discovery","authors":"Xiong Wang","doi":"10.1109/ICDM.2002.1184033","DOIUrl":"https://doi.org/10.1109/ICDM.2002.1184033","url":null,"abstract":"Three-dimensional point sets can be used to represent data in different domains. Given a database of 3D point sets, pattern discovery looks for similar subsets that occur in multiple point sets. Geometric hashing has proved to be an effective technique in discovering patterns in 3D point sets. However, the method are has shortcomings. We propose a new indexing technique called /spl Delta/B/sup +/ trees. It is an extension of B/sup +/-trees that stores point triplet information and overcomes shortcomings of the geometric hashing technique. We introduce four different ways of constructing the key from a triplet. We give an analytical comparison between the new index structure and the geometric hashing technique. We also conduct experiments on both synthetic data and real data to evaluate performance.","PeriodicalId":405340,"journal":{"name":"2002 IEEE International Conference on Data Mining, 2002. Proceedings.","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133641869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Discovery of interesting association rules from Livelink web log data 从Livelink web日志数据中发现有趣的关联规则
2002 IEEE International Conference on Data Mining, 2002. Proceedings. Pub Date : 2002-12-09 DOI: 10.1109/ICDM.2002.1184048
Xiangji Huang, Aijun An, N. Cercone, Gary Promhouse
{"title":"Discovery of interesting association rules from Livelink web log data","authors":"Xiangji Huang, Aijun An, N. Cercone, Gary Promhouse","doi":"10.1109/ICDM.2002.1184048","DOIUrl":"https://doi.org/10.1109/ICDM.2002.1184048","url":null,"abstract":"We present our experience in mining web usage patterns from a large collection of Livelink log data. Livelink is a web-based product of Open Text, which provides automatic management and retrieval of different types of information objects over an intranet or extranet. We report our experience in preprocessing raw log data and post-processing the mining results for finding interesting rules. In particular we compare and evaluate a number of rule interestingness measures and find that two of the measures that have not been used in association rule learning work very well.","PeriodicalId":405340,"journal":{"name":"2002 IEEE International Conference on Data Mining, 2002. Proceedings.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131005683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
A parameterless method for efficiently discovering clusters of arbitrary shape in large datasets 一种有效发现大数据集任意形状聚类的无参数方法
2002 IEEE International Conference on Data Mining, 2002. Proceedings. Pub Date : 2002-12-09 DOI: 10.1109/ICDM.2002.1183901
Andrew Foss, Osmar R Zaiane
{"title":"A parameterless method for efficiently discovering clusters of arbitrary shape in large datasets","authors":"Andrew Foss, Osmar R Zaiane","doi":"10.1109/ICDM.2002.1183901","DOIUrl":"https://doi.org/10.1109/ICDM.2002.1183901","url":null,"abstract":"Clustering is the problem of grouping data based on similarity and consists of maximizing the intra-group similarity while minimizing the inter-group similarity. The problem Of clustering data sets is also known as unsupervised classification, since no class labels are given. However, all existing clustering algorithms require some parameters to steer the clustering process, such as the famous k for the number of expected clusters, which constitutes a supervision of a sort. We present in this paper a new, efficient, fast and scalable clustering algorithm that clusters over a range of resolutions and finds a potential optimum clustering without requiring any parameter input. Our experiments show that our algorithm outperforms most existing clustering algorithms in quality and speed for large data sets.","PeriodicalId":405340,"journal":{"name":"2002 IEEE International Conference on Data Mining, 2002. Proceedings.","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134317546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 53
Mining top-k frequent closed patterns without minimum support 在没有最小支持的情况下挖掘top-k频繁封闭模式
2002 IEEE International Conference on Data Mining, 2002. Proceedings. Pub Date : 2002-12-09 DOI: 10.1109/ICDM.2002.1183905
Jiawei Han, Jianyong Wang, Ying Lu, P. Tzvetkov
{"title":"Mining top-k frequent closed patterns without minimum support","authors":"Jiawei Han, Jianyong Wang, Ying Lu, P. Tzvetkov","doi":"10.1109/ICDM.2002.1183905","DOIUrl":"https://doi.org/10.1109/ICDM.2002.1183905","url":null,"abstract":"In this paper, we propose a new mining task: mining top-k frequent closed patterns of length no less than min_/spl lscr/, where k is the desired number of frequent closed patterns to be mined, and min_/spl lscr/ is the minimal length of each pattern. An efficient algorithm, called TFP, is developed for mining such patterns without minimum support. Two methods, closed-node-count and descendant-sum are proposed to effectively raise support threshold and prune FP-tree both during and after the construction of FP-tree. During the mining process, a novel top-down and bottom-up combined FP-tree mining strategy is developed to speed-up support-raising and closed frequent pattern discovering. In addition, a fast hash-based closed pattern verification scheme has been employed to check efficiently if a potential closed pattern is really closed. Our performance study shows that in most cases, TFP outperforms CLOSET and CHARM, two efficient frequent closed pattern mining algorithms, even when both are running with the best tuned min-support. Furthermore, the method can be extended to generate association rules and to incorporate user-specified constraints.","PeriodicalId":405340,"journal":{"name":"2002 IEEE International Conference on Data Mining, 2002. Proceedings.","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133028197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 311
Feature selection for clustering - a filter solution 聚类的特征选择——一个过滤器解决方案
2002 IEEE International Conference on Data Mining, 2002. Proceedings. Pub Date : 2002-12-09 DOI: 10.1109/ICDM.2002.1183893
M. Dash, Ki-Hoon Choi, P. Scheuermann, Huan Liu
{"title":"Feature selection for clustering - a filter solution","authors":"M. Dash, Ki-Hoon Choi, P. Scheuermann, Huan Liu","doi":"10.1109/ICDM.2002.1183893","DOIUrl":"https://doi.org/10.1109/ICDM.2002.1183893","url":null,"abstract":"Processing applications with a large number of dimensions has been a challenge for the KDD community. Feature selection, an effective dimensionality reduction technique, is an essential pre-processing method to remove noisy features. In the literature only a few methods have been proposed for feature selection for clustering, and almost all these methods are 'wrapper' techniques that require a clustering algorithm to evaluate candidate feature subsets. The wrapper approach is largely unsuitable in real-world applications due to its heavy reliance on clustering algorithms that require parameters such as the number of clusters, and the lack of suitable clustering criteria to evaluate clustering in different subspaces. In this paper we propose a 'filter' method that is independent of any clustering algorithm. The proposed method is based on the observation that data with clusters has a very different point-to-point distance histogram to that of data without clusters. By exploiting this we propose an entropy measure that is low if data has distinct clusters and high if it does not. The entropy measure is suitable for selecting the most important subset of features because it is invariant with the number of dimensions, and is affected only by the quality of clustering. Extensive performance evaluation over synthetic, benchmark, and real datasets shows its effectiveness.","PeriodicalId":405340,"journal":{"name":"2002 IEEE International Conference on Data Mining, 2002. Proceedings.","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133866318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 437
A theory of inductive query answering 归纳式问答理论
2002 IEEE International Conference on Data Mining, 2002. Proceedings. Pub Date : 2002-12-09 DOI: 10.1109/ICDM.2002.1183894
L. D. Raedt, M. Jaeger, Sau-dan. Lee, H. Mannila
{"title":"A theory of inductive query answering","authors":"L. D. Raedt, M. Jaeger, Sau-dan. Lee, H. Mannila","doi":"10.1109/ICDM.2002.1183894","DOIUrl":"https://doi.org/10.1109/ICDM.2002.1183894","url":null,"abstract":"We introduce the Boolean inductive query evaluation problem, which is concerned with answering inductive queries that are arbitrary Boolean expressions over monotonic and anti-monotonic predicates. Secondly, we develop a decomposition theory for inductive query evaluation in which a Boolean query Q is reformulated into k sub-queries Q/sub i/ = Q/sub A/ /spl and/ Q/sub M/ that are the conjunction of a monotonic and an anti-monotonic predicate. The solution to each subquery can be represented using a version space. We investigate how the number of version spaces k needed to answer the query can be minimized. Thirdly, for the pattern domain of strings, we show how the version spaces can be represented using a novel data structure, called the version space tree, and can be computed using a variant of the famous a priori algorithm. Finally, we present experiments that validate the approach.","PeriodicalId":405340,"journal":{"name":"2002 IEEE International Conference on Data Mining, 2002. Proceedings.","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127849575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 107
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信