2002 IEEE International Conference on Data Mining, 2002. Proceedings.最新文献

筛选
英文 中文
Visually mining Web user clickpaths 可视化地挖掘Web用户点击路径
2002 IEEE International Conference on Data Mining, 2002. Proceedings. Pub Date : 2002-12-09 DOI: 10.1109/ICDM.2002.1184050
T. Mah, Y. Li
{"title":"Visually mining Web user clickpaths","authors":"T. Mah, Y. Li","doi":"10.1109/ICDM.2002.1184050","DOIUrl":"https://doi.org/10.1109/ICDM.2002.1184050","url":null,"abstract":"As powerful as clickpath mining methods can be, they often lead to huge incomprehensible and non-interesting result sets. Our clickpath mining practice at MSN was faced with challenges of keeping analysts closer to the data exploration process, revealing powerful insight from clickpath mining that business owners can directly act upon. These challenges stressed the importance of an interactive and visual representation of clickpath mining results. Most products today that can perform clickpath visualization do so by presenting massive cross-weaving web graphs. We present a new type of clickpath visualization which focuses only on clickpaths of interest, simplifying the visualization space while still retaining the same degree of mineable knowledge in the data. We also describe visualization techniques we have used to enhance the detection of interesting clickpath patterns from data, and provide a real-life case study that has benefited from the use of our implemented clickpath visualizer PAVE.","PeriodicalId":405340,"journal":{"name":"2002 IEEE International Conference on Data Mining, 2002. Proceedings.","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134282789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Using sequential and non-sequential patterns in predictive Web usage mining tasks 在预测性Web使用挖掘任务中使用顺序和非顺序模式
2002 IEEE International Conference on Data Mining, 2002. Proceedings. Pub Date : 2002-12-09 DOI: 10.1109/ICDM.2002.1184025
B. Mobasher, H. Dai, Tao Luo, M. Nakagawa
{"title":"Using sequential and non-sequential patterns in predictive Web usage mining tasks","authors":"B. Mobasher, H. Dai, Tao Luo, M. Nakagawa","doi":"10.1109/ICDM.2002.1184025","DOIUrl":"https://doi.org/10.1109/ICDM.2002.1184025","url":null,"abstract":"We describe an efficient framework for Web personalization based on sequential and non-sequential pattern discovery from usage data. Our experimental results performed on real usage data indicate that more restrictive patterns, such as contiguous sequential patterns (e.g., frequent navigational paths) are more suitable for predictive tasks, such as Web prefetching, (which involve predicting which item is accessed next by a user), while less constrained patterns, such as frequent item sets or general sequential patterns are more effective alternatives in the context of Web personalization and recommender systems.","PeriodicalId":405340,"journal":{"name":"2002 IEEE International Conference on Data Mining, 2002. Proceedings.","volume":"99 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134512767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 197
Discriminative category matching: efficient text classification for huge document collections 判别分类匹配:大型文档集合的高效文本分类
2002 IEEE International Conference on Data Mining, 2002. Proceedings. Pub Date : 2002-12-09 DOI: 10.1109/ICDM.2002.1183902
Gabriel K. P. Fung, J. Yu, Hongjun Lu
{"title":"Discriminative category matching: efficient text classification for huge document collections","authors":"Gabriel K. P. Fung, J. Yu, Hongjun Lu","doi":"10.1109/ICDM.2002.1183902","DOIUrl":"https://doi.org/10.1109/ICDM.2002.1183902","url":null,"abstract":"With the rapid growth of textual information available on the Internet, having a good model for classifying and managing documents automatically is undoubtedly important. When more documents are archived, new terms, new concepts and concept-drift will frequently appear Without a doubt, updating the classification model frequently, rather than using the old model for a very long period is absolutely essential. Here, the challenges are: a) obtain a high accuracy classification model; b) consume low computational time for both model training and operation; and c) occupy low storage space. However, none of the existing classification approaches could achieve all of these requirements. In this paper, we propose a novel text classification approach, called discriminative category matching, which could achieve all of the stated characteristics. Extensive experiments using two benchmarks and a large real-life collection are conducted. The encouraging results indicated that our approach is highly feasible.","PeriodicalId":405340,"journal":{"name":"2002 IEEE International Conference on Data Mining, 2002. Proceedings.","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115614256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Objective-oriented utility-based association mining 面向目标的基于效用的关联挖掘
2002 IEEE International Conference on Data Mining, 2002. Proceedings. Pub Date : 2002-12-09 DOI: 10.1109/ICDM.2002.1183938
Yi-Dong Shen, Zhong Zhang, Qiang Yang
{"title":"Objective-oriented utility-based association mining","authors":"Yi-Dong Shen, Zhong Zhang, Qiang Yang","doi":"10.1109/ICDM.2002.1183938","DOIUrl":"https://doi.org/10.1109/ICDM.2002.1183938","url":null,"abstract":"The necessity of developing methods for discovering association patterns to increase business utility of an enterprise has long been recognized in the data mining community. This requires modeling specific association patterns that are both statistically (based on support and confidence) and semantically (based on objective utility) related to a given objective that a user wants to achieve or is interested in. However, no such general model has been reported in the literature. Traditional association mining focuses on deriving correlations among a set of items and their association rules; diaper /spl rarr/ beer only tells us that a pattern like {diaper} is statistically related to an item like beer. In this paper we present a new approach, called objective-oriented utility-based association (OOA) mining, to modeling such association patterns that are explicitly related to a user's objective and its utility. Due to its focus on a user's objective and the use of objective utility as key semantic information to measure the usefulness of association patterns, OOA mining differs significantly from existing approaches such as existing constraint-based association mining. We formally define OOA mining and develop an algorithm for mining OOA rules. The algorithm is an enhancement of a priori with specific mechanisms for handling objective utility. We prove that the utility constraint is neither monotone nor anti-monotone, succinct or convertible and present a novel pruning strategy based on the utility constraint to improve the efficiency of OOA mining.","PeriodicalId":405340,"journal":{"name":"2002 IEEE International Conference on Data Mining, 2002. Proceedings.","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114760920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 85
Toward XML-based knowledge discovery systems 面向基于xml的知识发现系统
2002 IEEE International Conference on Data Mining, 2002. Proceedings. Pub Date : 2002-12-09 DOI: 10.1109/ICDM.2002.1184024
Rosa Meo, G. Psaila
{"title":"Toward XML-based knowledge discovery systems","authors":"Rosa Meo, G. Psaila","doi":"10.1109/ICDM.2002.1184024","DOIUrl":"https://doi.org/10.1109/ICDM.2002.1184024","url":null,"abstract":"Inductive databases are intended to be general purpose databases in which both source data and mined patterns can be represented, retrieved and manipulated. However, the heterogeneity of models for mined patterns makes difficult to realize them. In this paper, we explore the feasibility of using XML as the unifying framework for inductive databases, introducing a suitable data model called XDM (XML for data mining). XDM is designed to describe source raw data, heterogeneous mined patterns and data mining statements, so that they can be stored inside a unique XML-based inductive database.","PeriodicalId":405340,"journal":{"name":"2002 IEEE International Conference on Data Mining, 2002. Proceedings.","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122966013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
O-Cluster: scalable clustering of large high dimensional data sets O-Cluster:大型高维数据集的可扩展聚类
2002 IEEE International Conference on Data Mining, 2002. Proceedings. Pub Date : 2002-12-09 DOI: 10.1109/ICDM.2002.1183915
B. Milenova, M. Campos
{"title":"O-Cluster: scalable clustering of large high dimensional data sets","authors":"B. Milenova, M. Campos","doi":"10.1109/ICDM.2002.1183915","DOIUrl":"https://doi.org/10.1109/ICDM.2002.1183915","url":null,"abstract":"Clustering large data sets of high dimensionality has always been a challenge for clustering algorithms. Many recently developed clustering algorithms have attempted to address either handling data sets with a very large number of records and/or with a very high number of dimensions. We provide a discussion of the advantages and limitations of existing algorithms when they operate on very large multidimensional data sets. To simultaneously overcome both the \"curse of dimensionality\" and the scalability problems associated with large amounts of data, we propose a new clustering algorithm called O-Cluster. O-Cluster combines a novel active sampling technique with an axis-parallel partitioning strategy to identify continuous areas of high density in the input space. The method operates on a limited memory buffer and requires at most a single scan through the data. We demonstrate the high quality of the obtained clustering solutions, their robustness to noise, and O-Cluster's excellent scalability.","PeriodicalId":405340,"journal":{"name":"2002 IEEE International Conference on Data Mining, 2002. Proceedings.","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123537213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 82
Heterogeneous learner for Web page classification 网页分类的异构学习器
2002 IEEE International Conference on Data Mining, 2002. Proceedings. Pub Date : 2002-12-09 DOI: 10.1109/ICDM.2002.1183999
Hwanjo Yu, K. Chang, Jiawei Han
{"title":"Heterogeneous learner for Web page classification","authors":"Hwanjo Yu, K. Chang, Jiawei Han","doi":"10.1109/ICDM.2002.1183999","DOIUrl":"https://doi.org/10.1109/ICDM.2002.1183999","url":null,"abstract":"Classification of an interesting class of Web pages has been an interesting problem. Typical machine learning algorithms for this problem require two classes of data for training: positive and negative training examples. However in application to Web page classification, gathering an unbiased sample of negative examples appears to be difficult. We propose a heterogeneous learning framework for classifying Web pages, which (1) eliminates the need for negative training data, and (2) increases classification accuracy by using two heterogeneous learners. Our framework uses two heterogeneous learners-a decision list and a linear separator which complement each other-to eliminate the need for negative training data in the training phase and to increase the accuracy in the testing phase. Our results show that our heterogeneous framework achieves high accuracy without requiring negative training data; it enhances the accuracy of linear separators by reducing the errors on \"low-margin data\". That is, it classifies more accurately while requiring less human efforts in training.","PeriodicalId":405340,"journal":{"name":"2002 IEEE International Conference on Data Mining, 2002. Proceedings.","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121655441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Ensemble modeling through multiplicative adjustment of class probability 基于类概率乘性调整的集成建模
2002 IEEE International Conference on Data Mining, 2002. Proceedings. Pub Date : 2002-12-09 DOI: 10.1109/ICDM.2002.1184013
S. Hong, J. Hosking, R. Natarajan
{"title":"Ensemble modeling through multiplicative adjustment of class probability","authors":"S. Hong, J. Hosking, R. Natarajan","doi":"10.1109/ICDM.2002.1184013","DOIUrl":"https://doi.org/10.1109/ICDM.2002.1184013","url":null,"abstract":"We develop a new concept for aggregating items of evidence for class probability estimation. In Naive Bayes, each feature contributes an independent multiplicative factor to the estimated class probability. We modify this model to include an exponent in each factor in order to introduce feature importance. These exponents are chosen to maximize the accuracy of estimated class probabilities on the training data. For Naive Bayes, this modification accomplishes more than what feature selection can. More generally, since the individual features can be the outputs of separate probability models, this yields a new ensemble modeling approach, which we call APM (Adjusted Probability Model), along with a regularized version called APMR.","PeriodicalId":405340,"journal":{"name":"2002 IEEE International Conference on Data Mining, 2002. Proceedings.","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125588473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A comparative study of RNN for outlier detection in data mining RNN在数据挖掘异常点检测中的比较研究
2002 IEEE International Conference on Data Mining, 2002. Proceedings. Pub Date : 2002-12-09 DOI: 10.1109/ICDM.2002.1184035
Graham J. Williams, R. Baxter, Hongxing He, S. Hawkins, Lifang Gu
{"title":"A comparative study of RNN for outlier detection in data mining","authors":"Graham J. Williams, R. Baxter, Hongxing He, S. Hawkins, Lifang Gu","doi":"10.1109/ICDM.2002.1184035","DOIUrl":"https://doi.org/10.1109/ICDM.2002.1184035","url":null,"abstract":"We have proposed replicator neural networks (RNNs) for outlier detection. We compare RNN for outlier detection with three other methods using both publicly available statistical datasets (generally small) and data mining datasets (generally much larger and generally real data). The smaller datasets provide insights into the relative strengths and weaknesses of RNNs. The larger datasets in particular test scalability and practicality of application.","PeriodicalId":405340,"journal":{"name":"2002 IEEE International Conference on Data Mining, 2002. Proceedings.","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126514524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 292
Cluster merging and splitting in hierarchical clustering algorithms 分层聚类算法中的聚类合并与分裂
2002 IEEE International Conference on Data Mining, 2002. Proceedings. Pub Date : 2002-12-09 DOI: 10.1109/ICDM.2002.1183896
C. Ding, Xiaofeng He
{"title":"Cluster merging and splitting in hierarchical clustering algorithms","authors":"C. Ding, Xiaofeng He","doi":"10.1109/ICDM.2002.1183896","DOIUrl":"https://doi.org/10.1109/ICDM.2002.1183896","url":null,"abstract":"Hierarchical clustering constructs a hierarchy of clusters by either repeatedly merging two smaller clusters into a larger one or splitting a larger cluster into smaller ones. The crucial step is how to best select the next cluster(s) to split or merge. We provide a comprehensive analysis of selection methods and propose several new methods. We perform extensive clustering experiments to test 8 selection methods, and find that the average similarity is the best method in divisive clustering and the minmax linkage is the best in agglomerative clustering. Cluster balance is a key factor to achieve good performance. We also introduce the concept of objective function saturation and clustering target distance to effectively assess the quality of clustering.","PeriodicalId":405340,"journal":{"name":"2002 IEEE International Conference on Data Mining, 2002. Proceedings.","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129211922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 183
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信