2008 IEEE International Conference on Data Mining Workshops最新文献_第5页

Character String Analysis and Customer Path in Stream Data 流数据中的字符串分析和客户路径

2008 IEEE International Conference on Data Mining Workshops Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.41

K. Yada

引用次数: 0

Scalable Sparse Bayesian Network Learning for Spatial Applications 空间应用的可扩展稀疏贝叶斯网络学习

2008 IEEE International Conference on Data Mining Workshops Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.124

T. Liebig, Christine Kopp, M. May

引用次数: 13

Semi-supervised Collaborative Clustering with Partial Background Knowledge 基于部分背景知识的半监督协同聚类

2008 IEEE International Conference on Data Mining Workshops Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.116

G. Forestier, Cédric Wemmert, P. Gançarski

引用次数: 3

Wavelet-Based Data Perturbation for Simultaneous Privacy-Preserving and Statistics-Preserving 基于小波的数据摄动同时隐私保护和统计保护

2008 IEEE International Conference on Data Mining Workshops Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.77

Lian Liu, Jie Wang, Jun Zhang

引用次数: 44

Bounding and Estimating Association Rule Support from Clusters on Binary Data 二值数据上聚类关联规则支持度的边界和估计

2008 IEEE International Conference on Data Mining Workshops Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.47

C. Ordonez, Kai Zhao, Zhibo Chen

{"title":"Bounding and Estimating Association Rule Support from Clusters on Binary Data","authors":"C. Ordonez, Kai Zhao, Zhibo Chen","doi":"10.1109/ICDMW.2008.47","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.47","url":null,"abstract":"The theoretical relationship between association rules and machine learning techniques needs to be studied in more depth. This article studies the use of clustering as a model for association rule mining. The clustering model is exploited to bound and estimate association rule support and confidence. We first study the efficient computation of the clustering model with K-means; we show the sufficient statistics for clustering on binary data sets is the linear sum of points. We then prove item set support can be bounded and estimated from the model. Finally, we show support bounds fulfill the set downward closure property. Experiments study model accuracy and algorithm speed, paying particular attention to error behavior in support estimation. Given a sufficiently large number of clusters, the model becomes fairly accurate to approximate support. However, as the minimum support threshold decreases accuracy also decreases. The model is fairly accurate to discover a large fraction of frequent itemsets at different support levels. The model is compared against a traditional association rule algorithm to mine frequent itemsets, exhibiting better performance at low support levels. Time complexity to compute the binary cluster model is linear on data set size, whereas the dimensionality of transaction data sets has marginal impact on time.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"138 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123256331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Risk Assessment of Atmospheric Hazard Releases Using K-Means Clustering 基于k均值聚类的大气危害释放风险评估

2008 IEEE International Conference on Data Mining Workshops Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.89

G. Cervone, P. Franzese, Y. Ezber, Z. Boybeyi

引用次数: 8

Semantic Features for Multi-view Semi-supervised and Active Learning of Text Classification 文本分类多视图半监督主动学习的语义特征

2008 IEEE International Conference on Data Mining Workshops Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.13

Shiliang Sun

引用次数: 10

Mining Unstructured Text at Gigabyte per Second Speeds 以每秒千兆字节的速度挖掘非结构化文本

2008 IEEE International Conference on Data Mining Workshops Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.9

A. Ratner

引用次数: 0

An Adaptive Pre-filtering Technique for Error-Reduction Sampling in Active Learning 主动学习中误差减小采样的自适应预滤波技术

2008 IEEE International Conference on Data Mining Workshops Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.52

Michael Davy, S. Luz

{"title":"An Adaptive Pre-filtering Technique for Error-Reduction Sampling in Active Learning","authors":"Michael Davy, S. Luz","doi":"10.1109/ICDMW.2008.52","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.52","url":null,"abstract":"Error-reduction sampling (ERS) is a high performing (but computationally expensive) query selection strategy for active learning. Subset optimisation has been proposed to reduce computational expense by applying ERS to only a subset of examples from the pool. This paper compares techniques used to construct the subset, namely random sub-sampling and pre-filtering. We focus on pre-filtering which populates the subset with more informative examples by filtering from the unlabelled pool using a query selection strategy. In this paper we establish whether pre-filtering outperforms sub-sampling optimisation, examine the effect of subset size, and propose a novel adaptive pre-filtering technique which dynamically switches between several alternative pre-filtering techniques using a multi-armed bandit algorithm. Empirical evaluations conducted on two benchmark text categorisation datasets demonstrate that pre-filtered ERS achieve higher levels of accuracy when compared to sub-sampled ERS. The proposed adaptive pre-filtering technique is also shown to be competitive with the optimal pre-filtering technique on the majority of problems and is never the worst technique.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131466529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

High Granularity Remote Sensing and Crop Production over Space and Time: NDVI over the Growing Season and Prediction of Cotton Yields at the Farm Field Level in Texas 时空上的高粒度遥感与作物生产:德克萨斯州种植季NDVI与棉花产量预测

2008 IEEE International Conference on Data Mining Workshops Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.91

B. Little, M. Schucking, B. Gartrell, Bing Chen, K. Ross, R. McKellip

引用次数: 6