2002 IEEE International Conference on Data Mining, 2002. Proceedings.最新文献

筛选
英文 中文
Optimal projections of high dimensional data 高维数据的最佳投影
2002 IEEE International Conference on Data Mining, 2002. Proceedings. Pub Date : 2002-12-09 DOI: 10.1109/ICDM.2002.1184006
E. Corchado, C. Fyfe
{"title":"Optimal projections of high dimensional data","authors":"E. Corchado, C. Fyfe","doi":"10.1109/ICDM.2002.1184006","DOIUrl":"https://doi.org/10.1109/ICDM.2002.1184006","url":null,"abstract":"In this paper, we compare two artificial neural network algorithms for performing Exploratory Projection Pursuit, a statistical technique for investigating data by projecting it onto lower dimensional manifolds. The neural networks are extensions of a network which performs Principal Component Analysis. We illustrate the technique on artificial data before applying it to real data.","PeriodicalId":405340,"journal":{"name":"2002 IEEE International Conference on Data Mining, 2002. Proceedings.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131061891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Mining significant associations in large scale text corpora 挖掘大规模文本语料库中的重要关联
2002 IEEE International Conference on Data Mining, 2002. Proceedings. Pub Date : 2002-12-09 DOI: 10.1109/ICDM.2002.1183933
P. Raghavan, Panayiotis Tsaparas
{"title":"Mining significant associations in large scale text corpora","authors":"P. Raghavan, Panayiotis Tsaparas","doi":"10.1109/ICDM.2002.1183933","DOIUrl":"https://doi.org/10.1109/ICDM.2002.1183933","url":null,"abstract":"Mining large-scale text corpora is an essential step in extracting the key themes in a corpus. We motivate a quantitative measure for significant associations through the distributions of pairs and triplets of co-occurring words. We consider the algorithmic problem of efficiently enumerating such significant associations and present pruning algorithms for these problems, with theoretical as well as empirical analyses. Our algorithms make use of two novel mining methods: (1) matrix mining, and (2) shortened documents. We present evidence from a diverse set of documents that our measure does in fact elicit interesting co-occurrences.","PeriodicalId":405340,"journal":{"name":"2002 IEEE International Conference on Data Mining, 2002. Proceedings.","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131024901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Evolutionary time series segmentation for stock data mining 股票数据挖掘的演化时间序列分割
2002 IEEE International Conference on Data Mining, 2002. Proceedings. Pub Date : 2002-12-09 DOI: 10.1109/ICDM.2002.1183889
K. F. Chung, Tak-Chung Fu, R. Luk, Vincent Ng
{"title":"Evolutionary time series segmentation for stock data mining","authors":"K. F. Chung, Tak-Chung Fu, R. Luk, Vincent Ng","doi":"10.1109/ICDM.2002.1183889","DOIUrl":"https://doi.org/10.1109/ICDM.2002.1183889","url":null,"abstract":"Stock data in the form of multiple time series are difficult to process, analyze and mine. However, when they can be transformed into meaningful symbols like technical patterns, it becomes easier. Most recent work on time series queries concentrates only on how to identify a given pattern from a time series. Researchers do not consider the problem of identifying a suitable set of time points for segmenting the time series in accordance with a given set of pattern templates (e.g., a set of technical patterns for stock analysis). On the other hand, using fixed length segmentation is a primitive approach to this problem; hence, a dynamic approach (with high controllability) is preferred so that the time series can be segmented flexibly and effectively according to the needs of users and applications. In view of the fact that such a segmentation problem is an optimization problem and evolutionary computation is an appropriate tool to solve it, we propose an evolutionary time series segmentation algorithm. This approach allows a sizeable set of stock patterns to be generated for mining or query. In addition, defining the similarity between time series (or time series segments) is of fundamental importance in fitness computation. By identifying perceptually important points directly from the time domain, time series segments and templates of different lengths can be compared and intuitive pattern matching can be carried out in an effective and efficient manner. Encouraging experimental results are reported from tests that segment the time series of selected Hong Kong stocks.","PeriodicalId":405340,"journal":{"name":"2002 IEEE International Conference on Data Mining, 2002. Proceedings.","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130765051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 57
Progressive modeling 先进的建模
2002 IEEE International Conference on Data Mining, 2002. Proceedings. Pub Date : 2002-12-09 DOI: 10.1109/ICDM.2002.1183899
W. Fan, Haixun Wang, Philip S. Yu, S. Lo, S. Stolfo
{"title":"Progressive modeling","authors":"W. Fan, Haixun Wang, Philip S. Yu, S. Lo, S. Stolfo","doi":"10.1109/ICDM.2002.1183899","DOIUrl":"https://doi.org/10.1109/ICDM.2002.1183899","url":null,"abstract":"Presently, inductive learning is still performed in a frustrating batch process. The user has little interaction with the system and no control over the final accuracy and training time. If the accuracy of the produced model is too low, all the computing resources are misspent. In this paper we propose a progressive modeling framework. In progressive modeling, the learning algorithm estimates online both the accuracy of the final model and remaining training time. If the estimated accuracy is far below expectation, the user can terminate training prior to completion without wasting further resources. If the user chooses to complete the learning process, progressive modeling will compute a model with expected accuracy in expected time. We describe one implementation of progressive modeling using ensemble of classifiers.","PeriodicalId":405340,"journal":{"name":"2002 IEEE International Conference on Data Mining, 2002. Proceedings.","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132150935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Text document categorization by term association 按术语关联对文本文档进行分类
2002 IEEE International Conference on Data Mining, 2002. Proceedings. Pub Date : 2002-12-09 DOI: 10.1109/ICDM.2002.1183881
M. Antonie, Osmar R Zaiane
{"title":"Text document categorization by term association","authors":"M. Antonie, Osmar R Zaiane","doi":"10.1109/ICDM.2002.1183881","DOIUrl":"https://doi.org/10.1109/ICDM.2002.1183881","url":null,"abstract":"A good text classifier is a classifier that efficiently categorizes large sets of text documents in a reasonable time frame and with an acceptable accuracy, and that provides classification rules that are human readable for possible fine-tuning. If the training of the classifier is also quick, this could become in some application domains a good asset for the classifier. Many techniques and algorithms for automatic text categorization have been devised. According to published literature, some are more accurate than others, and some provide more interpretable classification models than others. However, none can combine all the beneficial properties enumerated above. In this paper we present a novel approach for automatic text categorization that borrows from market basket analysis techniques using association rule mining in the data-mining field. We focus on two major problems: (1) finding the best term association rules in a textual database by generating and pruning; and (2) using the rules to build a text classifier. Our text categorization method proves to be efficient and effective, and experiments on well-known collections show that the classifier performs well. In addition, training as well as classification are both fast and the generated rules are human readable.","PeriodicalId":405340,"journal":{"name":"2002 IEEE International Conference on Data Mining, 2002. Proceedings.","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128829258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 264
Comparison of lazy Bayesian rule, and tree-augmented Bayesian learning 懒惰贝叶斯规则和树增强贝叶斯学习的比较
2002 IEEE International Conference on Data Mining, 2002. Proceedings. Pub Date : 2002-12-09 DOI: 10.1109/ICDM.2002.1183993
Zhihai Wang, Geoffrey I. Webb
{"title":"Comparison of lazy Bayesian rule, and tree-augmented Bayesian learning","authors":"Zhihai Wang, Geoffrey I. Webb","doi":"10.1109/ICDM.2002.1183993","DOIUrl":"https://doi.org/10.1109/ICDM.2002.1183993","url":null,"abstract":"The naive Bayes classifier is widely used in interactive applications due to its computational efficiency, direct theoretical base, and competitive accuracy. However its attribute independence assumption can result in sub-optimal accuracy. A number of techniques have explored simple relaxations of the attribute independence assumption in order to increase accuracy. Among these, the lazy Bayesian rule (LBR) and the tree-augmented naive Bayes (TAN) have demonstrated strong prediction accuracy. However their relative performance has never been evaluated. The paper compares and contrasts these two techniques, finding that they have comparable accuracy and hence should be selected according to computational profile. LBR is desirable when small numbers of objects are to be classified while TAN is desirable when large numbers of objects are to be classified.","PeriodicalId":405340,"journal":{"name":"2002 IEEE International Conference on Data Mining, 2002. Proceedings.","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116128041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Attribute (feature) completion - the theory of attributes from data mining prospect 属性(特征)补全——来自数据挖掘前景的属性理论
2002 IEEE International Conference on Data Mining, 2002. Proceedings. Pub Date : 2002-12-09 DOI: 10.1109/ICDM.2002.1183914
T. Lin
{"title":"Attribute (feature) completion - the theory of attributes from data mining prospect","authors":"T. Lin","doi":"10.1109/ICDM.2002.1183914","DOIUrl":"https://doi.org/10.1109/ICDM.2002.1183914","url":null,"abstract":"A \"correct\" selection of attributes (features) is vital in data mining. As a first step, this paper constructs all possible attributes of a given relation. The results are based on the observations that each relation is isomorphic to a unique abstract relation, called a canonical model. The complete set of attributes of the canonical model is, then, constructed. Any attribute of a relation can be interpreted (via isomorphism) from such a complete set.","PeriodicalId":405340,"journal":{"name":"2002 IEEE International Conference on Data Mining, 2002. Proceedings.","volume":"41 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121017408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
Maintenance of sequential patterns for record modification using pre-large sequences 维护使用预大序列进行记录修改的顺序模式
2002 IEEE International Conference on Data Mining, 2002. Proceedings. Pub Date : 2002-12-09 DOI: 10.1109/ICDM.2002.1184031
Ching-Yao Wang, T. Hong, S. Tseng
{"title":"Maintenance of sequential patterns for record modification using pre-large sequences","authors":"Ching-Yao Wang, T. Hong, S. Tseng","doi":"10.1109/ICDM.2002.1184031","DOIUrl":"https://doi.org/10.1109/ICDM.2002.1184031","url":null,"abstract":"In previous work we proposed incremental mining algorithms for maintenance of sequential patterns based on the concept of pre-large sequences as records were inserted or deleted. Although maintenance of sequential patterns for record modification can be performed by using the deletion procedure and then the insertion procedure, double the computation time of a single procedure is needed. In this paper, we attempt to apply the concept of pre-large sequences to maintain sequential patterns as records are modified. The proposed algorithm does not require rescanning original databases until the accumulative number of modified customer sequences exceeds a safety bound derived by a pre-large concept. As databases grow larger, the number of modified customer sequences allowed before database rescanning also needs to grow.","PeriodicalId":405340,"journal":{"name":"2002 IEEE International Conference on Data Mining, 2002. Proceedings.","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122481134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Association analysis with one scan of databases 通过一次数据库扫描进行关联分析
2002 IEEE International Conference on Data Mining, 2002. Proceedings. Pub Date : 2002-12-09 DOI: 10.1109/ICDM.2002.1184015
Hao Huang, Xindong Wu, R. Relue
{"title":"Association analysis with one scan of databases","authors":"Hao Huang, Xindong Wu, R. Relue","doi":"10.1109/ICDM.2002.1184015","DOIUrl":"https://doi.org/10.1109/ICDM.2002.1184015","url":null,"abstract":"Mining frequent patterns with an FP-tree avoids costly candidate generation and repeatedly occurrence frequency checking against the support threshold. It therefore achieves better performance and efficiency than Apriori-like algorithms. However the database still needs to be scanned twice to get the FP-tree. This can be very time-consuming when new data are added to an existing database because two scans may be needed for not only the new data but also the existing data. This paper presents a new data structure P-tree, Pattern Tree, and a new technique, which can get the P-tree through only one scan of the database and can obtain the corresponding FP-tree with a specified support threshold. Updating a P-tree with new data needs one scan of the new data only, and the existing data do not need to be re-scanned.","PeriodicalId":405340,"journal":{"name":"2002 IEEE International Conference on Data Mining, 2002. Proceedings.","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131508958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 76
Mining a set of coregulated RNA sequences 挖掘一组协同调节的RNA序列
2002 IEEE International Conference on Data Mining, 2002. Proceedings. Pub Date : 2002-12-09 DOI: 10.1109/ICDM.2002.1184014
Yuh-Jyh Hu
{"title":"Mining a set of coregulated RNA sequences","authors":"Yuh-Jyh Hu","doi":"10.1109/ICDM.2002.1184014","DOIUrl":"https://doi.org/10.1109/ICDM.2002.1184014","url":null,"abstract":"Post-transcriptional regulation, though less studied, is an important research topic in bioinformatics. In a set of post-transcriptionally coregulated RNAs, the basepair interactions can organize the molecules into domains and provide a framework for functional interactions. Their consensus motifs may represent the binding sites of RNA regulatory proteins. Unlike DNA motifs, RNA motifs are more conserved in structures than in sequences. Knowing the structural motifs can help us better understand the regulation activities. We propose a novel data mining approach to RNA secondary structure prediction. To demonstrate the performance of our new approach, we first tested it on the same data sets previously used and published in literature. Secondly, to show the flexibility of our new approach, we also tested it on a data set that contains pseudoknot motifs that most current systems cannot identify.","PeriodicalId":405340,"journal":{"name":"2002 IEEE International Conference on Data Mining, 2002. Proceedings.","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127728189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信