2009 IEEE International Conference on Data Mining Workshops最新文献_第9页

Tree-Based Approach to Missing Data Imputation 基于树的缺失数据输入方法

2009 IEEE International Conference on Data Mining Workshops Pub Date : 2009-12-06 DOI: 10.1109/ICDMW.2009.92

P. Vateekul, Kanoksri Sarinnapakorn

引用次数: 22

Semantic-Rich Markov Models for Web Prefetching Web预取的富语义马尔可夫模型

2009 IEEE International Conference on Data Mining Workshops Pub Date : 2009-12-06 DOI: 10.1109/ICDMW.2009.18

Nizar R. Mabroukeh, C. Ezeife

引用次数: 50

An Effective Network Partitioning Algorithm Based on Two-Point Diffusing Strategy 一种基于两点扩散策略的有效网络划分算法

2009 IEEE International Conference on Data Mining Workshops Pub Date : 2009-12-06 DOI: 10.1109/ICDMW.2009.26

Chengying Mao

引用次数: 0

Why Naive Ensembles Do Not Work in Cloud Computing 为什么朴素集成在云计算中不起作用

2009 IEEE International Conference on Data Mining Workshops Pub Date : 2009-12-06 DOI: 10.1109/ICDMW.2009.85

Wenxuan Gao, R. Grossman, Philip S. Yu, Yunhong Gu

引用次数: 7

TagLearner: A P2P Classifier Learning System from Collaboratively Tagged Text Documents TagLearner:一个基于协作标记文本文档的P2P分类器学习系统

2009 IEEE International Conference on Data Mining Workshops Pub Date : 2009-12-06 DOI: 10.1109/ICDMW.2009.90

Haimonti Dutta, Xianshu Zhu, Tushar Mahule, H. Kargupta, K. Borne, Codrina Lauth, Florian Holz, Gerhard Heyer

{"title":"TagLearner: A P2P Classifier Learning System from Collaboratively Tagged Text Documents","authors":"Haimonti Dutta, Xianshu Zhu, Tushar Mahule, H. Kargupta, K. Borne, Codrina Lauth, Florian Holz, Gerhard Heyer","doi":"10.1109/ICDMW.2009.90","DOIUrl":"https://doi.org/10.1109/ICDMW.2009.90","url":null,"abstract":"The amount of text data on the Internet is growing at a very fast rate. Online text repositories for news agencies, digital libraries and other organizations currently store giga and tera-bytes of data. Large amounts of unstructured text poses a serious challenge for data mining and knowledge extraction. End user participation coupled with distributed computation can play a crucial role in meeting these challenges. In many applications involving classification of text documents, web users often participate in the tagging process. This collaborative tagging results in the formation of large scale Peer-to-Peer (P2P) systems which can function, scale and self-organize in the presence of highly transient population of nodes and do not need a central server for co-ordination. In this paper, we describe TagLearner, a P2P classifier learning system for extracting patterns from text data where the end users can participate both in the task of labeling the data and building a distributed classifier on it. We present a novel distributed linear programming based classification algorithm which is asynchronous in nature. The paper also provides extensive empirical results on text data obtained from an online repository - the NSF Abstracts Data.","PeriodicalId":351078,"journal":{"name":"2009 IEEE International Conference on Data Mining Workshops","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134274862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

An Attack on the Privacy of Sanitized Data that Fuses the Outputs of Multiple Data Miners 对融合多个数据挖掘者输出的净化数据隐私的攻击

2009 IEEE International Conference on Data Mining Workshops Pub Date : 2009-12-06 DOI: 10.1109/ICDMW.2009.28

Michal Sramka, R. Safavi-Naini, J. Denzinger

引用次数: 11

A Semi-supervised Framework for Simultaneous Classification and Regression of Zero-Inflated Time Series Data with Application to Precipitation Prediction 零膨胀时间序列数据同时分类与回归的半监督框架及其在降水预测中的应用

2009 IEEE International Conference on Data Mining Workshops Pub Date : 2009-12-06 DOI: 10.1109/ICDMW.2009.80

Zubin Abraham, P. Tan

引用次数: 9

Feature Selection for Maximizing the Area Under the ROC Curve 最大化ROC曲线下面积的特征选择

2009 IEEE International Conference on Data Mining Workshops Pub Date : 2009-12-06 DOI: 10.1109/ICDMW.2009.25

Rui Wang, K. Tang

{"title":"Feature Selection for Maximizing the Area Under the ROC Curve","authors":"Rui Wang, K. Tang","doi":"10.1109/ICDMW.2009.25","DOIUrl":"https://doi.org/10.1109/ICDMW.2009.25","url":null,"abstract":"Feature selection is an important pre-processing step for solving classification problems. A good feature selection method may not only improve the performance of the final classifier, but also reduce the computational complexity of it. Traditionally, feature selection methods were developed to maximize the classification accuracy of a classifier. Recently, both theoretical and experimental studies revealed that a classifier with the highest accuracy might not be ideal in real-world problems. Instead, the Area Under the ROC Curve (AUC) has been suggested as the alternative metric, and many existing learning algorithms have been modified in order to seek the classifier with maximum AUC. However, little work was done to develop new feature selection methods to suit the requirement of AUC maximization. To fill this gap in the literature, we propose in this paper a novel algorithm, called AUC and Rank Correlation coefficient Optimization (ARCO) algorithm. ARCO adopts the general framework of a well-known method, namely minimal redundancy- maximal-relevance (mRMR) criterion, but defines the terms ”relevance” and ”redundancy” in totally different ways. Such a modification looks trivial from the perspective of algorithmic design. Nevertheless, experimental study on four gene expression data sets showed that feature subsets obtained by ARCO resulted in classifiers with significantly larger AUC than the feature subsets obtained by mRMR. Moreover, ARCO also outperformed the Feature Assessment by Sliding Thresholds algorithm, which was recently proposed for AUC maximization, and thus the efficacy of ARCO was validated.","PeriodicalId":351078,"journal":{"name":"2009 IEEE International Conference on Data Mining Workshops","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122927520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 51

Discovering Domain Specific Concepts within User-Generated Taxonomies 在用户生成的分类法中发现特定领域的概念

2009 IEEE International Conference on Data Mining Workshops Pub Date : 2009-12-06 DOI: 10.1109/ICDMW.2009.57

Jonathan Klinginsmith, M. Mahoui, Yuqing Wu, Josette F. Jones

引用次数: 2

Theoretically Optimal Distributed Anomaly Detection 理论上最优分布式异常检测

2009 IEEE International Conference on Data Mining Workshops Pub Date : 2009-12-06 DOI: 10.1109/ICDMW.2009.40

A. Lazarevic, Nisheeth Srivastava, Ashutosh Tiwari, Joshua D. Isom, N. Oza, J. Srivastava

引用次数: 7