2009 IEEE International Conference on Data Mining Workshops最新文献_第4页

Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm 贪心是不够的:一种高效的批处理模式主动学习算法

2009 IEEE International Conference on Data Mining Workshops Pub Date : 2009-12-06 DOI: 10.1109/ICDMW.2009.38

Zuobing Xu, Christopher Hogan, Robert S. Bauer

{"title":"Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm","authors":"Zuobing Xu, Christopher Hogan, Robert S. Bauer","doi":"10.1109/ICDMW.2009.38","DOIUrl":"https://doi.org/10.1109/ICDMW.2009.38","url":null,"abstract":"Active learning algorithms actively select training examples to acquire labels from domain experts, which are very effective to reduce human labeling effort in the context of supervised learning. To reduce computational time in training, as well as provide more convenient user interaction environment, it is necessary to select batches of new training examples instead of a single example. Batch mode active learning algorithms incorporate a diversity measure to construct a batch of diversified candidate examples. Existing approaches use greedy algorithms to make it feasible to the scale of thousands of data. Greedy algorithms, however, are not efficient enough to scale to even larger real world classification applications, which contain millions of data. In this paper, we present an extremely efficient active learning algorithm. This new active learning algorithm achieves the same results as the traditional greedy algorithm, while the run time is reduced by a factor of several hundred times. We prove that the objective function of the algorithm is submodular, which guarantees to find the same solution as the greedy algorithm. We evaluate our approach on several largescale real-world text classification problems, and show that our new approach achieves substantial speedups, while obtaining the same classification accuracy.","PeriodicalId":351078,"journal":{"name":"2009 IEEE International Conference on Data Mining Workshops","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125244229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

A New Minimally Supervised Learning Method for Semantic Term Classification - Experimental Results on Classifying Ratable Aspects Discussed in Customer Reviews 语义术语分类的一种新的最小监督学习方法——顾客评论中可评等方面分类的实验结果

2009 IEEE International Conference on Data Mining Workshops Pub Date : 2009-12-06 DOI: 10.1109/ICDMW.2009.58

T. Nguyen, Takahiro Hayashi, R. Onai, Yuhei Nishioka, Takamasa Takenaka, Masaya Mori

{"title":"A New Minimally Supervised Learning Method for Semantic Term Classification - Experimental Results on Classifying Ratable Aspects Discussed in Customer Reviews","authors":"T. Nguyen, Takahiro Hayashi, R. Onai, Yuhei Nishioka, Takamasa Takenaka, Masaya Mori","doi":"10.1109/ICDMW.2009.58","DOIUrl":"https://doi.org/10.1109/ICDMW.2009.58","url":null,"abstract":"We present Bautext, a new minimally supervised approach for automatically extracting ratable aspects from customer reviews and classifying them to some previously defined categories. Bautext requires a small amount of seed words as supervised data and uses a bootstrapping mechanism o progressively collect new member for each category. Learning new category members and the category-specific terms for each category at the same time is the unique and featured classification mechanism of Bautext. Category-specific terms are terms that play important roles for properly extracting new category members. Furthermore, we proposed to use an additional Trash category to filter non-purpose aspects, thus led to a significant improvement in precision score but could constrain the trade-off in decreasing recall score. Experimental results, conducted on a Japanese hotel review dataset, showed that Bautext outperforms the alternative techniques in all terms of precision, recall score and significantly in running time. And in the further comparison to Adaboost (as the state-of-the-art machine learning technique for semantic term classification task), we found that Adaboost require about 50% training data to deliver a similar performance as Bautext does with less than ten selective seed words for each category.","PeriodicalId":351078,"journal":{"name":"2009 IEEE International Conference on Data Mining Workshops","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131303488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

K-BestMatch Reconstruction and Comparison of Trajectory Data 弹道数据的K-BestMatch重构与比较

2009 IEEE International Conference on Data Mining Workshops Pub Date : 2009-12-06 DOI: 10.1109/ICDMW.2009.62

M. Nanni, R. Trasarti

引用次数: 4

A Paradigm Shift: Combined Literature and Ontology-Driven Data Mining for Discovering Novel Relations in Biomedical Domain 范式转变:结合文献和本体驱动的数据挖掘发现生物医学领域的新关系

2009 IEEE International Conference on Data Mining Workshops Pub Date : 2009-12-06 DOI: 10.1109/ICDMW.2009.56

Y. Sebastian, B. C. Loh, P. Then

引用次数: 4

A WordNet-Based Semantic Model for Enhancing Text Clustering 基于wordnet的增强文本聚类的语义模型

2009 IEEE International Conference on Data Mining Workshops Pub Date : 2009-12-06 DOI: 10.1109/ICDMW.2009.86

Shady Shehata

{"title":"A WordNet-Based Semantic Model for Enhancing Text Clustering","authors":"Shady Shehata","doi":"10.1109/ICDMW.2009.86","DOIUrl":"https://doi.org/10.1109/ICDMW.2009.86","url":null,"abstract":"Most of text mining techniques are based on word and/or phrase analysis of the text. The statistical analysis of a term (word or phrase) frequency captures the importance of the term within a document. However, to achieve a more accurate analysis, the underlying mining technique should indicate terms that capture the semantics of the text from which the importance of a term in a sentence and in the document can be derived. Incorporating semantic features from the WordNet lexical database is one of many approaches that have been tried to improve the accuracy of text clustering techniques. A new semantic-based model that analyzes documents based on their meaning is introduced. The proposed model analyzes terms and their corresponding synonyms and/or hypernyms on the sentence and document levels. In this model, if two documents contain different words and these words are semantically related, the proposed model can measure the semantic-based similarity between the two documents. The similarity between documents relies on a new semantic-based similarity measure which is applied to the matching concepts between documents. Experiments using the proposed semantic-based model in text clustering are conducted. Experimental results demonstrate that the newly developed semantic-based model enhances the clustering quality of sets of documents substantially.","PeriodicalId":351078,"journal":{"name":"2009 IEEE International Conference on Data Mining Workshops","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116280151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 36

Motivating Complex Dependence Structures in Data Mining: A Case Study with Anomaly Detection in Climate 数据挖掘中复杂依赖结构的激励:以气候异常检测为例

2009 IEEE International Conference on Data Mining Workshops Pub Date : 2009-12-06 DOI: 10.1109/ICDMW.2009.37

S. Kao, A. Ganguly, K. Steinhaeuser

引用次数: 22

Detecting and Interpreting Variable Interactions in Observational Ornithology Data 探测和解释观测鸟类数据中的可变相互作用

2009 IEEE International Conference on Data Mining Workshops Pub Date : 2009-12-06 DOI: 10.1109/ICDMW.2009.84

Daria Sorokina, R. Caruana, Mirek Riedewald, W. Hochachka, S. Kelling

引用次数: 8

Spatiotemporal Modeling and Monitoring of Atmospheric Hazardous Emissions Using Sensor Networks 基于传感器网络的大气有害物质排放时空模拟与监测

2009 IEEE International Conference on Data Mining Workshops Pub Date : 2009-12-06 DOI: 10.1109/ICDMW.2009.67

G. Cervone, A. Stefanidis, P. Franzese, P. Agouris

引用次数: 1

Towards a Universal Text Classifier: Transfer Learning Using Encyclopedic Knowledge 迈向通用文本分类器:使用百科知识的迁移学习

2009 IEEE International Conference on Data Mining Workshops Pub Date : 2009-12-06 DOI: 10.1109/ICDMW.2009.101

Pu Wang, C. Domeniconi

引用次数: 13

Multiple Instance Transfer Learning 多实例迁移学习

2009 IEEE International Conference on Data Mining Workshops Pub Date : 2009-12-06 DOI: 10.1109/ICDMW.2009.72

Dan Zhang, Luo Si

引用次数: 10