2002 IEEE International Conference on Data Mining, 2002. Proceedings.最新文献

筛选
英文 中文
Evaluating the utility of statistical phrases and latent semantic indexing for text classification 评估统计短语和潜在语义索引在文本分类中的效用
2002 IEEE International Conference on Data Mining, 2002. Proceedings. Pub Date : 2002-12-09 DOI: 10.1109/ICDM.2002.1184036
H. Wu, D. Gunopulos
{"title":"Evaluating the utility of statistical phrases and latent semantic indexing for text classification","authors":"H. Wu, D. Gunopulos","doi":"10.1109/ICDM.2002.1184036","DOIUrl":"https://doi.org/10.1109/ICDM.2002.1184036","url":null,"abstract":"The term-based vector space model is a prominent technique for retrieving textual information. In this paper we examine the usefulness of phrases as terms in vector-based document classification. We focus on statistical techniques to extract both adjacent and window phrases from documents. We discover that the positive effect of adding phrase terms is very limited, if we have already achieved good performance using single-word terms, even when SVD/LSI is used as the dimensionality reduction method.","PeriodicalId":405340,"journal":{"name":"2002 IEEE International Conference on Data Mining, 2002. Proceedings.","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133307167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Feature selection algorithms: a survey and experimental evaluation 特征选择算法:综述与实验评价
2002 IEEE International Conference on Data Mining, 2002. Proceedings. Pub Date : 2002-12-09 DOI: 10.1109/ICDM.2002.1183917
L. Molina, L. B. Muñoz, À. Nebot
{"title":"Feature selection algorithms: a survey and experimental evaluation","authors":"L. Molina, L. B. Muñoz, À. Nebot","doi":"10.1109/ICDM.2002.1183917","DOIUrl":"https://doi.org/10.1109/ICDM.2002.1183917","url":null,"abstract":"In view of the substantial number of existing feature selection algorithms, the need arises to count on criteria that enables to adequately decide which algorithm to use in certain situations. This work assesses the performance of several fundamental algorithms found in the literature in a controlled scenario. A scoring measure ranks the algorithms by taking into account the amount of relevance, irrelevance and redundance on sample data sets. This measure computes the degree of matching between the output given by the algorithm and the known optimal solution. Sample size effects are also studied.","PeriodicalId":405340,"journal":{"name":"2002 IEEE International Conference on Data Mining, 2002. Proceedings.","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116481699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 692
Empirical comparison of various reinforcement learning strategies for sequential targeted marketing 序列目标营销中各种强化学习策略的实证比较
2002 IEEE International Conference on Data Mining, 2002. Proceedings. Pub Date : 2002-12-09 DOI: 10.1109/ICDM.2002.1183879
N. Abe, E. Pednault, Haixun Wang, B. Zadrozny, W. Fan, C. Apté
{"title":"Empirical comparison of various reinforcement learning strategies for sequential targeted marketing","authors":"N. Abe, E. Pednault, Haixun Wang, B. Zadrozny, W. Fan, C. Apté","doi":"10.1109/ICDM.2002.1183879","DOIUrl":"https://doi.org/10.1109/ICDM.2002.1183879","url":null,"abstract":"We empirically evaluate the performance of various reinforcement learning methods in applications to sequential targeted marketing. In particular we propose and evaluate a progression of reinforcement learning methods, ranging from the \"direct\" or \"batch\" methods to \"indirect\" or \"simulation based\" methods, and those that we call \"semidirect\" methods that fall between them. We conduct a number of controlled experiments to evaluate the performance of these competing methods. Our results indicate that while the indirect methods can perform better in a situation in which nearly perfect modeling is possible, under the more realistic situations in which the system's modeling parameters have restricted attention, the indirect methods' performance tend to degrade. We also show that semi-direct methods are effective in reducing the amount of computation necessary to attain a given level of performance, and often result in more profitable policies.","PeriodicalId":405340,"journal":{"name":"2002 IEEE International Conference on Data Mining, 2002. Proceedings.","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123518242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Mining generalized association rules using pruning techniques 利用剪枝技术挖掘广义关联规则
2002 IEEE International Conference on Data Mining, 2002. Proceedings. Pub Date : 2002-12-09 DOI: 10.1109/ICDM.2002.1183907
Yin-Fu Huang, Chieh-Ming Wu
{"title":"Mining generalized association rules using pruning techniques","authors":"Yin-Fu Huang, Chieh-Ming Wu","doi":"10.1109/ICDM.2002.1183907","DOIUrl":"https://doi.org/10.1109/ICDM.2002.1183907","url":null,"abstract":"The goal of the paper is to mine generalized association rules using pruning techniques. Given a large transaction database and a hierarchical taxonomy tree of the items, we try to find the association rules between the items at different levels in the taxonomy tree under the assumption that original frequent itemsets and association rules have already been generated beforehand In the proposed algorithm GMAR, we use join methods and pruning techniques to generate new generalized association rules. Through several comprehensive experiments, we find that the GMAR algorithm is much better than BASIC and Cumulate algorithms.","PeriodicalId":405340,"journal":{"name":"2002 IEEE International Conference on Data Mining, 2002. Proceedings.","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124963027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 47
Mining associated implication networks: computational intermarket analysis 挖掘关联隐含网络:计算市场间分析
2002 IEEE International Conference on Data Mining, 2002. Proceedings. Pub Date : 2002-12-09 DOI: 10.1109/ICDM.2002.1184030
P. W. Tse, Jiming Liu
{"title":"Mining associated implication networks: computational intermarket analysis","authors":"P. W. Tse, Jiming Liu","doi":"10.1109/ICDM.2002.1184030","DOIUrl":"https://doi.org/10.1109/ICDM.2002.1184030","url":null,"abstract":"Current attempts to analyze international financial markets include the use of financial technical analysis and data mining techniques. In this paper, we propose a new approach that incorporates implication networks and association rules to form an associated network structure. The proposed approach explicitly addresses the issue of local vs. global influences between financial markets.","PeriodicalId":405340,"journal":{"name":"2002 IEEE International Conference on Data Mining, 2002. Proceedings.","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125973153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
SLPMiner: an algorithm for finding frequent sequential patterns using length-decreasing support constraint SLPMiner:一种使用减长支持约束查找频繁序列模式的算法
2002 IEEE International Conference on Data Mining, 2002. Proceedings. Pub Date : 2002-12-09 DOI: 10.1109/ICDM.2002.1183937
Masakazu Seno, G. Karypis
{"title":"SLPMiner: an algorithm for finding frequent sequential patterns using length-decreasing support constraint","authors":"Masakazu Seno, G. Karypis","doi":"10.1109/ICDM.2002.1183937","DOIUrl":"https://doi.org/10.1109/ICDM.2002.1183937","url":null,"abstract":"Over the years, a variety of algorithms for finding frequent sequential patterns in very large sequential databases have been developed. The key feature in most of these algorithms is that they use a constant support constraint to control the inherently exponential complexity of the problem. In general, patterns that contain only a few items will tend to be interesting if they have good support, whereas long patterns can still be interesting even if their support is relatively small. Ideally, we need an algorithm that finds all the frequent patterns whose support decreases as a function of their length. In this paper we present an algorithm called SLPMiner that finds all sequential patterns that satisfy a length-decreasing support constraint. Our experimental evaluation shows that SLPMiner achieves up to two orders of magnitude of speedup by effectively exploiting the length-decreasing support constraint, and that its runtime increases gradually as the average length of the sequences (and the discovered frequent patterns) increases.","PeriodicalId":405340,"journal":{"name":"2002 IEEE International Conference on Data Mining, 2002. Proceedings.","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124628758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 115
Towards automatic generation of query taxonomy: a hierarchical query clustering approach 迈向自动生成查询分类:一种分层查询聚类方法
2002 IEEE International Conference on Data Mining, 2002. Proceedings. Pub Date : 2002-12-09 DOI: 10.1109/ICDM.2002.1183888
Shui-Lung Chuang, Lee-Feng Chien
{"title":"Towards automatic generation of query taxonomy: a hierarchical query clustering approach","authors":"Shui-Lung Chuang, Lee-Feng Chien","doi":"10.1109/ICDM.2002.1183888","DOIUrl":"https://doi.org/10.1109/ICDM.2002.1183888","url":null,"abstract":"Most previous work on automatic query clustering generated a flat, un-nested partition of query terms. In this work, we discuss the organization of query terms into a hierarchical structure and construct a query taxonomy in an automatic way. The proposed approach is designed based on a hierarchical agglomerative clustering algorithm to hierarchically group similar queries and generate cluster hierarchies using a novel cluster partition technique. The search processes of real-world search engines are combined to obtain highly ranked Web documents as the feature source for each query term. Preliminary experiments show that the proposed approach is effective for obtaining thesaurus information for query terms, and is also feasible for constructing a query taxonomy which provides a basis for in-depth analysis of users' search interests and domain-specific vocabulary on a larger scale.","PeriodicalId":405340,"journal":{"name":"2002 IEEE International Conference on Data Mining, 2002. Proceedings.","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126264873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 75
Iterative clustering of high dimensional text data augmented by local search 局部搜索增强的高维文本数据迭代聚类
2002 IEEE International Conference on Data Mining, 2002. Proceedings. Pub Date : 2002-12-09 DOI: 10.1109/ICDM.2002.1183895
I. Dhillon, Yuqiang Guan, J. Kogan
{"title":"Iterative clustering of high dimensional text data augmented by local search","authors":"I. Dhillon, Yuqiang Guan, J. Kogan","doi":"10.1109/ICDM.2002.1183895","DOIUrl":"https://doi.org/10.1109/ICDM.2002.1183895","url":null,"abstract":"The k-means algorithm with cosine similarity, also known as the spherical k-means algorithm, is a popular method for clustering document collections. However spherical k-means can often yield qualitatively poor results, especially when cluster sizes are small, say 25-30 documents per cluster, where it tends to get stuck at a local maximum far away from the optimal solution. In this paper, we present a local search procedure, which we call 'first-variation\" that refines a given clustering by incrementally moving data points between clusters, thus achieving a higher objective function value. An enhancement of first variation allows a chain of such moves in a Kernighan-Lin fashion and leads to a better local maximum. Combining the enhanced first-variation with spherical k-means yields a powerful \"ping-pong\" strategy that often qualitatively improves k-means clustering and is computationally efficient. We present several experimental results to highlight the improvement achieved by our proposed algorithm in clustering high-dimensional and sparse text data.","PeriodicalId":405340,"journal":{"name":"2002 IEEE International Conference on Data Mining, 2002. Proceedings.","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116823965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 155
Mining association rules from stars 从星型中挖掘关联规则
2002 IEEE International Conference on Data Mining, 2002. Proceedings. Pub Date : 2002-12-09 DOI: 10.1109/ICDM.2002.1183919
Eric Ka Ka Ng, A. Fu, Ke Wang
{"title":"Mining association rules from stars","authors":"Eric Ka Ka Ng, A. Fu, Ke Wang","doi":"10.1109/ICDM.2002.1183919","DOIUrl":"https://doi.org/10.1109/ICDM.2002.1183919","url":null,"abstract":"Association rule mining is an important data mining problem. It is found to be useful for conventional relational data. However, previous work has mostly targeted on mining a single table. In real life, a database is typically made up of multiple tables and one important case is where some of the tables form a star schema. The tables typically correspond to entity sets and joining the tables in a star schema gives relationships among entity sets which can be very interesting information. Hence mining on the join result is an important problem. Based on characteristics of the star schema we propose an efficient algorithm for mining association rules on the join result but without actually performing the join operation. We show that this approach can significantly out-perform the join-then-mine approach even when the latter adopts a fastest known mining algorithm.","PeriodicalId":405340,"journal":{"name":"2002 IEEE International Conference on Data Mining, 2002. Proceedings.","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117033876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 56
Exploring the parameter state space of stacking 探索叠加的参数状态空间
2002 IEEE International Conference on Data Mining, 2002. Proceedings. Pub Date : 2002-12-09 DOI: 10.1109/ICDM.2002.1184029
A. Seewald
{"title":"Exploring the parameter state space of stacking","authors":"A. Seewald","doi":"10.1109/ICDM.2002.1184029","DOIUrl":"https://doi.org/10.1109/ICDM.2002.1184029","url":null,"abstract":"Ensemble learning schemes are a new field in data mining. While current research concentrates mainly on improving the performance of single learning algorithms, an alternative is to combine learners with different biases. Stacking is the best-known such scheme which tries to combine learners' predictions or confidences via another learning algorithm. However, the adoption of stacking into the data mining community is hampered by its large parameter space, consisting mainly of other learning algorithms: (1) the set of learning algorithms to combine, (2) the meta-learner responsible for the combining; and (3) the type of meta-data to use - confidences or predictions. None of these parameters are obvious choices. Furthermore, little is known about the relation between the parameter settings and performance of stacking. By exploring all of stacking's parameter settings and their interdependencies, we attempt to make stacking a suitable choice for mainstream data mining applications.","PeriodicalId":405340,"journal":{"name":"2002 IEEE International Conference on Data Mining, 2002. Proceedings.","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121562793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信