Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining最新文献

筛选
英文 中文
Training and testing of recommender systems on data missing not at random 针对非随机缺失数据的推荐系统进行培训和测试
H. Steck
{"title":"Training and testing of recommender systems on data missing not at random","authors":"H. Steck","doi":"10.1145/1835804.1835895","DOIUrl":"https://doi.org/10.1145/1835804.1835895","url":null,"abstract":"Users typically rate only a small fraction of all available items. We show that the absence of ratings carries useful information for improving the top-k hit rate concerning all items, a natural accuracy measure for recommendations. As to test recommender systems, we present two performance measures that can be estimated, under mild assumptions, without bias from data even when ratings are missing not at random (MNAR). As to achieve optimal test results, we present appropriate surrogate objective functions for efficient training on MNAR data. Their main property is to account for all ratings - whether observed or missing in the data. Concerning the top-k hit rate on test data, our experiments indicate dramatic improvements over even sophisticated methods that are optimized on observed ratings only.","PeriodicalId":20529,"journal":{"name":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75613515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 345
Discovery of significant emerging trends 发现重要的新趋势
Saurabh Goorha, Lyle Ungar
{"title":"Discovery of significant emerging trends","authors":"Saurabh Goorha, Lyle Ungar","doi":"10.1145/1835804.1835815","DOIUrl":"https://doi.org/10.1145/1835804.1835815","url":null,"abstract":"We describe a system that monitors social and mainstream media to determine shifts in what people are thinking about a product or company. We process over 100,000 news articles, blog posts, review sites, and tweets a day for mentions of items (e.g., products) of interest, extract phrases that are mentioned near them, and determine which of the phrases are of greatest possible interest to, for example, brand managers. Case studies show a good ability to rapidly pinpoint emerging subjects buried deep in large volumes of data and then highlight those that are rising or falling in significance as they relate to the firms interests. The tool and algorithm improves the signal-to-noise ratio and pinpoints precisely the opportunities and risks that matter most to communications professionals and their organizations.","PeriodicalId":20529,"journal":{"name":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74053719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 78
Learning to combine discriminative classifiers: confidence based 学习结合判别分类器:基于信心
Chi-Hoon Lee
{"title":"Learning to combine discriminative classifiers: confidence based","authors":"Chi-Hoon Lee","doi":"10.1145/1835804.1835899","DOIUrl":"https://doi.org/10.1145/1835804.1835899","url":null,"abstract":"Much of research in data mining and machine learning has led to numerous practical applications. Spam filtering, fraud detection, and user query-intent analysis has relied heavily on machine learned classifiers, and resulted in improvements in robust classification accuracy. Combining multiple classifiers (a.k.a. Ensemble Learning) is a well studied and has been known to improve effectiveness of a classifier. To address two key challenges in Ensemble Learning-- (1) learning weights of individual classifiers and (2) the combination rule of their weighted responses, this paper proposes a novel Ensemble classifier, EnLR, that computes weights of responses from discriminative classifiers and combines their weighted responses to produce a single response for a test instance. The combination rule is based on aggregating weighted responses, where a weight of an individual classifier is inversely based on their respective variances around their responses. Here, variance quantifies the uncertainty of the discriminative classifiers' parameters, which in turn depends on the training samples. As opposed to other ensemble methods where the weight of each individual classifier is learned as a part of parameter learning and thus the same weight is applied to all testing instances, our model is actively adjusted as individual classifiers become confident at its decision for a test instance. Our empirical experiments on various data sets demonstrate that our combined classifier produces \"effective\" results when compared with a single classifier. Our novel classifier shows statistically significant better accuracy when compared to well known Ensemble methods -- Bagging and AdaBoost. In addition to robust accuracy, our model is extremely efficient dealing with high volumes of training samples due to the independent learning paradigm among its multiple classifiers. It is simple to implement in a distributed computing environment such as Hadoop.","PeriodicalId":20529,"journal":{"name":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75547650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Suggesting friends using the implicit social graph 使用内隐社交图推荐朋友
Maayan Roth, Assaf Ben-David, David Deutscher, Guy Flysher, I. Horn, Ari Leichtberg, Naty Leiser, Yossi Matias, Ron Merom
{"title":"Suggesting friends using the implicit social graph","authors":"Maayan Roth, Assaf Ben-David, David Deutscher, Guy Flysher, I. Horn, Ari Leichtberg, Naty Leiser, Yossi Matias, Ron Merom","doi":"10.1145/1835804.1835836","DOIUrl":"https://doi.org/10.1145/1835804.1835836","url":null,"abstract":"Although users of online communication tools rarely categorize their contacts into groups such as \"family\", \"co-workers\", or \"jogging buddies\", they nonetheless implicitly cluster contacts, by virtue of their interactions with them, forming implicit groups. In this paper, we describe the implicit social graph which is formed by users' interactions with contacts and groups of contacts, and which is distinct from explicit social graphs in which users explicitly add other individuals as their \"friends\". We introduce an interaction-based metric for estimating a user's affinity to his contacts and groups. We then describe a novel friend suggestion algorithm that uses a user's implicit social graph to generate a friend group, given a small seed set of contacts which the user has already labeled as friends. We show experimental results that demonstrate the importance of both implicit group relationships and interaction-based affinity ranking in suggesting friends. Finally, we discuss two applications of the Friend Suggest algorithm that have been released as Gmail Labs features.","PeriodicalId":20529,"journal":{"name":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72705928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 241
Mining program workflow from interleaved traces 从交错的轨迹挖掘程序工作流
Jian-Guang Lou, Qiang Fu, Shengqi Yang, Jiang Li, Bin Wu
{"title":"Mining program workflow from interleaved traces","authors":"Jian-Guang Lou, Qiang Fu, Shengqi Yang, Jiang Li, Bin Wu","doi":"10.1145/1835804.1835883","DOIUrl":"https://doi.org/10.1145/1835804.1835883","url":null,"abstract":"Successful software maintenance is becoming increasingly critical due to the increasing dependence of our society and economy on software systems. One key problem of software maintenance is the difficulty in understanding the evolving software systems. Program workflows can help system operators and administrators to understand system behaviors and verify system executions so as to greatly facilitate system maintenance. In this paper, we propose an algorithm to automatically discover program workflows from event traces that record system events during system execution. Different from existing workflow mining algorithms, our approach can construct concurrent workflows from traces of interleaved events. Our workflow mining approach is a three-step coarse-to-fine algorithm. At first, we mine temporal dependencies for each pair of events. Then, based on the mined pair-wise tem-poral dependencies, we construct a basic workflow model by a breadth-first path pruning algorithm. After that, we refine the workflow by verifying it with all training event traces. The re-finement algorithm tries to find out a workflow that can interpret all event traces with minimal state transitions and threads. The results of both simulation data and real program data show that our algorithm is highly effective.","PeriodicalId":20529,"journal":{"name":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77605835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 91
Estimating rates of rare events with multiple hierarchies through scalable log-linear models 利用可扩展对数线性模型估计多层次罕见事件的概率
D. Agarwal, Rahul Agrawal, Rajiv Khanna, Nagaraj Kota
{"title":"Estimating rates of rare events with multiple hierarchies through scalable log-linear models","authors":"D. Agarwal, Rahul Agrawal, Rajiv Khanna, Nagaraj Kota","doi":"10.1145/1835804.1835834","DOIUrl":"https://doi.org/10.1145/1835804.1835834","url":null,"abstract":"We consider the problem of estimating rates of rare events for high dimensional, multivariate categorical data where several dimensions are hierarchical. Such problems are routine in several data mining applications including computational advertising, our main focus in this paper. We propose LMMH, a novel log-linear modeling method that scales to massive data applications with billions of training records and several million potential predictors in a map-reduce framework. Our method exploits correlations in aggregates observed at multiple resolutions when working with multiple hierarchies; stable estimates at coarser resolution provide informative prior information to improve estimates at finer resolutions. Other than prediction accuracy and scalability, our method has an inbuilt variable screening procedure based on a \"spike and slab prior\" that provides parsimony by removing non-informative predictors without hurting predictive accuracy. We perform large scale experiments on data from real computational advertising applications and illustrate our approach on datasets with several billion records and hundreds of millions of predictors. Extensive comparisons with other benchmark methods show significant improvements in prediction accuracy.","PeriodicalId":20529,"journal":{"name":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78339400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 81
Session details: Research track 17: social network analysis 议题17:社会网络分析
Brian D. Davison
{"title":"Session details: Research track 17: social network analysis","authors":"Brian D. Davison","doi":"10.1145/3248797","DOIUrl":"https://doi.org/10.1145/3248797","url":null,"abstract":"","PeriodicalId":20529,"journal":{"name":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78465905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Overlapping experiment infrastructure: more, better, faster experimentation 重叠实验基础设施:更多、更好、更快的实验
Diane Tang, Ashish Agarwal, Deirdre O'Brien, Mike Meyer
{"title":"Overlapping experiment infrastructure: more, better, faster experimentation","authors":"Diane Tang, Ashish Agarwal, Deirdre O'Brien, Mike Meyer","doi":"10.1145/1835804.1835810","DOIUrl":"https://doi.org/10.1145/1835804.1835810","url":null,"abstract":"At Google, experimentation is practically a mantra; we evaluate almost every change that potentially affects what our users experience. Such changes include not only obvious user-visible changes such as modifications to a user interface, but also more subtle changes such as different machine learning algorithms that might affect ranking or content selection. Our insatiable appetite for experimentation has led us to tackle the problems of how to run more experiments, how to run experiments that produce better decisions, and how to run them faster. In this paper, we describe Google's overlapping experiment infrastructure that is a key component to solving these problems. In addition, because an experiment infrastructure alone is insufficient, we also discuss the associated tools and educational processes required to use it effectively. We conclude by describing trends that show the success of this overall experimental environment. While the paper specifically describes the experiment system and experimental processes we have in place at Google, we believe they can be generalized and applied by any entity interested in using experimentation to improve search engines and other web applications.","PeriodicalId":20529,"journal":{"name":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85437256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 288
Optimizing debt collections using constrained reinforcement learning 利用约束强化学习优化债务回收
N. Abe, Prem Melville, Cezar Pendus, C. Reddy, David L. Jensen, V. P. Thomas, J. Bennett, Gary F. Anderson, Brent R. Cooley, Melissa Kowalczyk, Mark Domick, Timothy Gardinier
{"title":"Optimizing debt collections using constrained reinforcement learning","authors":"N. Abe, Prem Melville, Cezar Pendus, C. Reddy, David L. Jensen, V. P. Thomas, J. Bennett, Gary F. Anderson, Brent R. Cooley, Melissa Kowalczyk, Mark Domick, Timothy Gardinier","doi":"10.1145/1835804.1835817","DOIUrl":"https://doi.org/10.1145/1835804.1835817","url":null,"abstract":"The problem of optimally managing the collections process by taxation authorities is one of prime importance, not only for the revenue it brings but also as a means to administer a fair taxing system. The analogous problem of debt collections management in the private sector, such as banks and credit card companies, is also increasingly gaining attention. With the recent successes in the applications of data analytics and optimization to various business areas, the question arises to what extent such collections processes can be improved by use of leading edge data modeling and optimization techniques. In this paper, we propose and develop a novel approach to this problem based on the framework of constrained Markov Decision Process (MDP), and report on our experience in an actual deployment of a tax collections optimization system at New York State Department of Taxation and Finance (NYS DTF).","PeriodicalId":20529,"journal":{"name":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85466456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 78
Mining positive and negative patterns for relevance feature discovery 挖掘积极和消极模式的相关特征发现
Yuefeng Li, Abdulmohsen Algarni, N. Zhong
{"title":"Mining positive and negative patterns for relevance feature discovery","authors":"Yuefeng Li, Abdulmohsen Algarni, N. Zhong","doi":"10.1145/1835804.1835900","DOIUrl":"https://doi.org/10.1145/1835804.1835900","url":null,"abstract":"It is a big challenge to guarantee the quality of discovered relevance features in text documents for describing user preferences because of the large number of terms, patterns, and noise. Most existing popular text mining and classification methods have adopted term-based approaches. However, they have all suffered from the problems of polysemy and synonymy. Over the years, people have often held the hypothesis that pattern-based methods should perform better than term-based ones in describing user preferences, but many experiments do not support this hypothesis. The innovative technique presented in paper makes a breakthrough for this difficulty. This technique discovers both positive and negative patterns in text documents as higher level features in order to accurately weight low-level features (terms) based on their specificity and their distributions in the higher level features. Substantial experiments using this technique on Reuters Corpus Volume 1 and TREC topics show that the proposed approach significantly outperforms both the state-of-the-art term-based methods underpinned by Okapi BM25, Rocchio or Support Vector Machine and pattern based methods on precision, recall and F measures.","PeriodicalId":20529,"journal":{"name":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89584633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 97
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信