2010 IEEE International Conference on Data Mining Workshops最新文献

筛选
英文 中文
Integer Programming for Multi-class Active Learning 多类主动学习的整数规划
2010 IEEE International Conference on Data Mining Workshops Pub Date : 2010-12-13 DOI: 10.1109/ICDMW.2010.148
Dragomir Yankov, Suju Rajan, A. Ratnaparkhi
{"title":"Integer Programming for Multi-class Active Learning","authors":"Dragomir Yankov, Suju Rajan, A. Ratnaparkhi","doi":"10.1109/ICDMW.2010.148","DOIUrl":"https://doi.org/10.1109/ICDMW.2010.148","url":null,"abstract":"Active learning has been demonstrated to be a powerful tool for improving the effectiveness of binary classifiers. It iteratively identifies informative unlabeled examples which after labeling are used to augment the initial training set. Adapting the procedure to large-scale, multi-class classification problems, however, poses certain challenges. For instance, to guarantee improvement by the method we may need to select a large number of examples that require prohibitive labeling resources. Furthermore, the notion of informative examples also changes significantly when multiple classes are considered. In this paper we show that multi-class active learning can be cast into an integer programming framework, where a subset of examples that are informative across maximum number of classes is selected. We test our approach on several large-scale document categorization problems. We demonstrate that in the case of limited labeling resources and large number of classes the proposed method is more effective compared to other known approaches.","PeriodicalId":170201,"journal":{"name":"2010 IEEE International Conference on Data Mining Workshops","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128357688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Enhancing Document Exploration with OLAP 使用OLAP增强文档探索
2010 IEEE International Conference on Data Mining Workshops Pub Date : 2010-12-13 DOI: 10.1109/ICDMW.2010.37
Zhibo Chen, Carlos Garcia-Alvarado, C. Ordonez
{"title":"Enhancing Document Exploration with OLAP","authors":"Zhibo Chen, Carlos Garcia-Alvarado, C. Ordonez","doi":"10.1109/ICDMW.2010.37","DOIUrl":"https://doi.org/10.1109/ICDMW.2010.37","url":null,"abstract":"Finding relevant documents in digital libraries has been a well studied problem in information retrieval. It is not uncommon to see users browsing digital collections without having a clear idea of the keyword search that they should perform. However, we believe that such initial query search is not totally independent from the target search. Therefore, we use these initial document selections to further explore these documents. In the following demonstration, we exploit On-line Analytical Processing (OLAP) for knowledge discovery in digital collections to achieve query refinement. Such refinement is the result of applying a traditional ranking technique, based on the vector space model, selecting the top keywords in the resulting subset of documents, and then displaying certain cuboids of the keywords. Based on these cuboids, which are ranked by their frequency, the users can select a query that can better represent their actual target search. We show that this document exploration can be done efficiently within the DBMS and exploit in-database extensions, such as User-Defined Functions, as well as standard SQL. Additionally, we demonstrate a novel approach to obtaining query refinement through OLAP data cubes.","PeriodicalId":170201,"journal":{"name":"2010 IEEE International Conference on Data Mining Workshops","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126901236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A Block Mixture Model for Pattern Discovery in Preference Data 偏好数据模式发现的块混合模型
2010 IEEE International Conference on Data Mining Workshops Pub Date : 2010-12-13 DOI: 10.1109/ICDMW.2010.59
Nicola Barbieri, M. Guarascio, G. Manco
{"title":"A Block Mixture Model for Pattern Discovery in Preference Data","authors":"Nicola Barbieri, M. Guarascio, G. Manco","doi":"10.1109/ICDMW.2010.59","DOIUrl":"https://doi.org/10.1109/ICDMW.2010.59","url":null,"abstract":"This paper presents a probabilistic co-clustering approach to pattern discovery in preference data. We extended the original formulation of the block mixture model to handle rating data, the resulting model allows the simultaneous clustering of users and items in homogeneous user communities and item categories. The parameter of the model are determined using a variational approximation and a two-phase application of the EM algorithm. The experimental evaluation showed that proposed approach can be used both for rating prediction and pattern discovery tasks, such as the analysis of common trends within the same user community and the identification of interesting relationships between products belonging to the same item category. In particular, using Movie Lens data, we show how it is possibile to infer topics for each item category, and how to model community interests and transition among topics of interest.","PeriodicalId":170201,"journal":{"name":"2010 IEEE International Conference on Data Mining Workshops","volume":"167 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126939481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
On Attribute Disclosure in Randomization Based Privacy Preserving Data Publishing 基于随机化的隐私保护数据发布中的属性披露研究
2010 IEEE International Conference on Data Mining Workshops Pub Date : 2010-12-13 DOI: 10.1109/ICDMW.2010.76
Ling Guo, Xiaowei Ying, Xintao Wu
{"title":"On Attribute Disclosure in Randomization Based Privacy Preserving Data Publishing","authors":"Ling Guo, Xiaowei Ying, Xintao Wu","doi":"10.1109/ICDMW.2010.76","DOIUrl":"https://doi.org/10.1109/ICDMW.2010.76","url":null,"abstract":"Privacy preserving micro data publication has received wide attentions. In this paper, we investigate the randomization approach and focus on attribute disclosure under linking attacks. We give efficient solutions to determine optimal distortion parameters such that we can maximize utility preservation while still satisfying privacy requirements. We compare our randomization approach with l-diversity and anatomy in terms of utility preservation (under the same privacy requirements) from three aspects (reconstructed distributions, accuracy of answering queries, and preservation of correlations). Our empirical results show that randomization incurs significantly smaller utility loss.","PeriodicalId":170201,"journal":{"name":"2010 IEEE International Conference on Data Mining Workshops","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116781217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
System Biology Approach for Elucidating the Relationship Between Indonesian Herbal Plants and the Efficacy of Jamu 用系统生物学方法研究印尼草本植物与Jamu功效的关系
2010 IEEE International Conference on Data Mining Workshops Pub Date : 2010-12-13 DOI: 10.1109/ICDMW.2010.105
F. Afendi, L. K. Darusman, Aki Hirai, M. Altaf-Ul-Amin, Hiroki Takahashi, Kensuke Nakamura, S. Kanaya
{"title":"System Biology Approach for Elucidating the Relationship Between Indonesian Herbal Plants and the Efficacy of Jamu","authors":"F. Afendi, L. K. Darusman, Aki Hirai, M. Altaf-Ul-Amin, Hiroki Takahashi, Kensuke Nakamura, S. Kanaya","doi":"10.1109/ICDMW.2010.105","DOIUrl":"https://doi.org/10.1109/ICDMW.2010.105","url":null,"abstract":"Jamu is Indonesian herbal medicine made from a mixture of several plants. Some plants perform as main ingredients and the others as supporting ingredients. By utilizing biplot configuration, we explored the relationship between Indonesian herbal plants and the efficacy of jamu. Among 465 plants used in 3138 jamu, we determined that 190 plants were efficacious in at least one efficacy. We therefore consider these plants to be the main ingredients of jamu. The other 275 plants are considered to be supporting ingredients in jamu because their efficacy has not been established.","PeriodicalId":170201,"journal":{"name":"2010 IEEE International Conference on Data Mining Workshops","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129701305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
A Framework for Emotion Mining from Text in Online Social Networks 在线社交网络中文本情感挖掘的框架
2010 IEEE International Conference on Data Mining Workshops Pub Date : 2010-12-13 DOI: 10.1109/ICDMW.2010.75
Mohamed Yassine, Hazem M. Hajj
{"title":"A Framework for Emotion Mining from Text in Online Social Networks","authors":"Mohamed Yassine, Hazem M. Hajj","doi":"10.1109/ICDMW.2010.75","DOIUrl":"https://doi.org/10.1109/ICDMW.2010.75","url":null,"abstract":"Online Social Networks are so popular nowadays that they are a major component of an individual’s social interaction. They are also emotionally-rich environments where close friends share their emotions, feelings and thoughts. In this paper, a new framework is proposed for characterizing emotional interactions in social networks, and then using these characteristics to distinguish friends from acquaintances. The goal is to extract the emotional content of texts in online social networks. The interest is in whether the text is an expression of the writer’s emotions or not. For this purpose, text mining techniques are performed on comments retrieved from a social network. The framework includes a model for data collection, database schemas, data processing and data mining steps. The informal language of online social networks is a main point to consider before performing any text mining techniques. This is why the framework includes the development of special lexicons. In general, the paper presents a new perspective for studying friendship relations and emotions’ expression in online social networks where it deals with the nature of these sites and the nature of the language used. It considers Lebanese Face book users as a case study. The technique adopted is unsupervised, it mainly uses the k-means clustering algorithm. Experiments show high accuracy for the model in both determining subjectivity of texts and predicting friendship.","PeriodicalId":170201,"journal":{"name":"2010 IEEE International Conference on Data Mining Workshops","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128866960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 105
QueRIE: A Query Recommender System Supporting Interactive Database Exploration QueRIE:一个支持交互式数据库探索的查询推荐系统
2010 IEEE International Conference on Data Mining Workshops Pub Date : 2010-12-13 DOI: 10.1109/ICDMW.2010.43
Sarika Mittal, Jothi Swarubini Vindhiya Varman, Gloria Chatzopoulou, M. Eirinaki, N. Polyzotis
{"title":"QueRIE: A Query Recommender System Supporting Interactive Database Exploration","authors":"Sarika Mittal, Jothi Swarubini Vindhiya Varman, Gloria Chatzopoulou, M. Eirinaki, N. Polyzotis","doi":"10.1109/ICDMW.2010.43","DOIUrl":"https://doi.org/10.1109/ICDMW.2010.43","url":null,"abstract":"This demonstration presents QueRIE, a recommender system that supports interactive database exploration. This system aims at assisting non-expert users of scientific databases by generating personalized query recommendations. Drawing inspiration from Web recommender systems, QueRIE tracks the querying behavior of each user and identifies potentially “interesting” parts of the database related to the corresponding data analysis task by locating those database parts that were accessed by similar users in the past. It then generates and recommends the queries that cover those parts to the user.","PeriodicalId":170201,"journal":{"name":"2010 IEEE International Conference on Data Mining Workshops","volume":"286 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124565569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Large-Scale Customized Models for Advertisers 广告主大规模定制模式
2010 IEEE International Conference on Data Mining Workshops Pub Date : 2010-12-13 DOI: 10.1109/ICDMW.2010.157
A. Bagherjeiran, A. O. Hatch, A. Ratnaparkhi, R. Parekh
{"title":"Large-Scale Customized Models for Advertisers","authors":"A. Bagherjeiran, A. O. Hatch, A. Ratnaparkhi, R. Parekh","doi":"10.1109/ICDMW.2010.157","DOIUrl":"https://doi.org/10.1109/ICDMW.2010.157","url":null,"abstract":"Performance advertisers want to maximize the return on their advertising spend. In the online advertising world, this means showing the ad only to those users most likely to convert i.e. buy a product or service. Existing ad targeting solutions such as context targeting and rule-based segment targeting primarily leverage marketing intuition to identify audience segments that would be likely to convert. Even the more sophisticated model-based approaches such as behavioral targeting identify audience segments interested in certain coarse-grained categories defined by the publisher. Advertisers are now able, through beaconing, to tell us exactly who their preferred customers are. Advertisers want to augment their existing advertising campaign with custom models that learn from the campaign and focus on attracting new users. Motivated by our experience with advertisers, we pose this problem within the context of ensemble learning. Building custom models for an existing ad campaign can be viewed as operations on an ensemble classifier: add, modify, or complement a classifier. An ideal new classifier should incrementally improve the ensemble and minimize overlap with any existing classifiers already in the ensemble–it should learn something new. With the proposed approach we are able to augment the advertising campaigns of several large advertisers at a large online advertising company.","PeriodicalId":170201,"journal":{"name":"2010 IEEE International Conference on Data Mining Workshops","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121777318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
RnR: Extracting Rationale from Online Reviews and Ratings RnR:从在线评论和评级中提取基本原理
2010 IEEE International Conference on Data Mining Workshops Pub Date : 2010-12-13 DOI: 10.1109/ICDMW.2010.167
Dwi A. P. Rahayu, S. Krishnaswamy, O. Alahakoon, C. Labbé
{"title":"RnR: Extracting Rationale from Online Reviews and Ratings","authors":"Dwi A. P. Rahayu, S. Krishnaswamy, O. Alahakoon, C. Labbé","doi":"10.1109/ICDMW.2010.167","DOIUrl":"https://doi.org/10.1109/ICDMW.2010.167","url":null,"abstract":"Review mining is a part of web mining which focuses on getting main information from user review. State of the art review mining systems focus on identifying semantic orientation of reviews and providing sentences or feature scores. There has been little focus on understanding the rationale for the ratings that are provided. This paper presents our proposed RnR system for extracting rationale from online reviews and ratings. We have implemented the system for evaluation on online reviews for hotels from TripAdvisor.com and present extensive experimental evaluation that demonstrates the improved computational performance of our approach and the accuracy in terms of identifying the rationale. This RnR system is available for testing from http://rnrsystem.com/RnRSystem","PeriodicalId":170201,"journal":{"name":"2010 IEEE International Conference on Data Mining Workshops","volume":"193 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126079258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
S4: Distributed Stream Computing Platform S4:分布式流计算平台
2010 IEEE International Conference on Data Mining Workshops Pub Date : 2010-12-13 DOI: 10.1109/ICDMW.2010.172
L. Neumeyer, B. Robbins, Anish Nair, Anand Kesari
{"title":"S4: Distributed Stream Computing Platform","authors":"L. Neumeyer, B. Robbins, Anish Nair, Anand Kesari","doi":"10.1109/ICDMW.2010.172","DOIUrl":"https://doi.org/10.1109/ICDMW.2010.172","url":null,"abstract":"S4 is a general-purpose, distributed, scalable, partially fault-tolerant, pluggable platform that allows programmers to easily develop applications for processing continuous unbounded streams of data. Keyed data events are routed with affinity to Processing Elements (PEs), which consume the events and do one or both of the following: (1) emit one or more events which may be consumed by other PEs, (2) publish results. The architecture resembles the Actors model, providing semantics of encapsulation and location transparency, thus allowing applications to be massively concurrent while exposing a simple programming interface to application developers. In this paper, we outline the S4 architecture in detail, describe various applications, including real-life deployments. Our design is primarily driven by large scale applications for data mining and machine learning in a production environment. We show that the S4 design is surprisingly flexible and lends itself to run in large clusters built with commodity hardware.","PeriodicalId":170201,"journal":{"name":"2010 IEEE International Conference on Data Mining Workshops","volume":"162 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125456492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 968
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信