Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining最新文献_第10页

Training and testing of recommender systems on data missing not at random 针对非随机缺失数据的推荐系统进行培训和测试

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2010-07-25 DOI: 10.1145/1835804.1835895

H. Steck

引用次数: 345

Discovery of significant emerging trends 发现重要的新趋势

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2010-07-25 DOI: 10.1145/1835804.1835815

Saurabh Goorha, Lyle Ungar

引用次数: 78

Learning to combine discriminative classifiers: confidence based 学习结合判别分类器:基于信心

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2010-07-25 DOI: 10.1145/1835804.1835899

Chi-Hoon Lee

{"title":"Learning to combine discriminative classifiers: confidence based","authors":"Chi-Hoon Lee","doi":"10.1145/1835804.1835899","DOIUrl":"https://doi.org/10.1145/1835804.1835899","url":null,"abstract":"Much of research in data mining and machine learning has led to numerous practical applications. Spam filtering, fraud detection, and user query-intent analysis has relied heavily on machine learned classifiers, and resulted in improvements in robust classification accuracy. Combining multiple classifiers (a.k.a. Ensemble Learning) is a well studied and has been known to improve effectiveness of a classifier. To address two key challenges in Ensemble Learning-- (1) learning weights of individual classifiers and (2) the combination rule of their weighted responses, this paper proposes a novel Ensemble classifier, EnLR, that computes weights of responses from discriminative classifiers and combines their weighted responses to produce a single response for a test instance. The combination rule is based on aggregating weighted responses, where a weight of an individual classifier is inversely based on their respective variances around their responses. Here, variance quantifies the uncertainty of the discriminative classifiers' parameters, which in turn depends on the training samples. As opposed to other ensemble methods where the weight of each individual classifier is learned as a part of parameter learning and thus the same weight is applied to all testing instances, our model is actively adjusted as individual classifiers become confident at its decision for a test instance. Our empirical experiments on various data sets demonstrate that our combined classifier produces \"effective\" results when compared with a single classifier. Our novel classifier shows statistically significant better accuracy when compared to well known Ensemble methods -- Bagging and AdaBoost. In addition to robust accuracy, our model is extremely efficient dealing with high volumes of training samples due to the independent learning paradigm among its multiple classifiers. It is simple to implement in a distributed computing environment such as Hadoop.","PeriodicalId":20529,"journal":{"name":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"44 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75547650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Suggesting friends using the implicit social graph 使用内隐社交图推荐朋友

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2010-07-25 DOI: 10.1145/1835804.1835836

Maayan Roth, Assaf Ben-David, David Deutscher, Guy Flysher, I. Horn, Ari Leichtberg, Naty Leiser, Yossi Matias, Ron Merom

{"title":"Suggesting friends using the implicit social graph","authors":"Maayan Roth, Assaf Ben-David, David Deutscher, Guy Flysher, I. Horn, Ari Leichtberg, Naty Leiser, Yossi Matias, Ron Merom","doi":"10.1145/1835804.1835836","DOIUrl":"https://doi.org/10.1145/1835804.1835836","url":null,"abstract":"Although users of online communication tools rarely categorize their contacts into groups such as \"family\", \"co-workers\", or \"jogging buddies\", they nonetheless implicitly cluster contacts, by virtue of their interactions with them, forming implicit groups. In this paper, we describe the implicit social graph which is formed by users' interactions with contacts and groups of contacts, and which is distinct from explicit social graphs in which users explicitly add other individuals as their \"friends\". We introduce an interaction-based metric for estimating a user's affinity to his contacts and groups. We then describe a novel friend suggestion algorithm that uses a user's implicit social graph to generate a friend group, given a small seed set of contacts which the user has already labeled as friends. We show experimental results that demonstrate the importance of both implicit group relationships and interaction-based affinity ranking in suggesting friends. Finally, we discuss two applications of the Friend Suggest algorithm that have been released as Gmail Labs features.","PeriodicalId":20529,"journal":{"name":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"96 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72705928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 241

Mining program workflow from interleaved traces 从交错的轨迹挖掘程序工作流

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2010-07-25 DOI: 10.1145/1835804.1835883

Jian-Guang Lou, Qiang Fu, Shengqi Yang, Jiang Li, Bin Wu

{"title":"Mining program workflow from interleaved traces","authors":"Jian-Guang Lou, Qiang Fu, Shengqi Yang, Jiang Li, Bin Wu","doi":"10.1145/1835804.1835883","DOIUrl":"https://doi.org/10.1145/1835804.1835883","url":null,"abstract":"Successful software maintenance is becoming increasingly critical due to the increasing dependence of our society and economy on software systems. One key problem of software maintenance is the difficulty in understanding the evolving software systems. Program workflows can help system operators and administrators to understand system behaviors and verify system executions so as to greatly facilitate system maintenance. In this paper, we propose an algorithm to automatically discover program workflows from event traces that record system events during system execution. Different from existing workflow mining algorithms, our approach can construct concurrent workflows from traces of interleaved events. Our workflow mining approach is a three-step coarse-to-fine algorithm. At first, we mine temporal dependencies for each pair of events. Then, based on the mined pair-wise tem-poral dependencies, we construct a basic workflow model by a breadth-first path pruning algorithm. After that, we refine the workflow by verifying it with all training event traces. The re-finement algorithm tries to find out a workflow that can interpret all event traces with minimal state transitions and threads. The results of both simulation data and real program data show that our algorithm is highly effective.","PeriodicalId":20529,"journal":{"name":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"210 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77605835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 91

Estimating rates of rare events with multiple hierarchies through scalable log-linear models 利用可扩展对数线性模型估计多层次罕见事件的概率

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2010-07-25 DOI: 10.1145/1835804.1835834

D. Agarwal, Rahul Agrawal, Rajiv Khanna, Nagaraj Kota

{"title":"Estimating rates of rare events with multiple hierarchies through scalable log-linear models","authors":"D. Agarwal, Rahul Agrawal, Rajiv Khanna, Nagaraj Kota","doi":"10.1145/1835804.1835834","DOIUrl":"https://doi.org/10.1145/1835804.1835834","url":null,"abstract":"We consider the problem of estimating rates of rare events for high dimensional, multivariate categorical data where several dimensions are hierarchical. Such problems are routine in several data mining applications including computational advertising, our main focus in this paper. We propose LMMH, a novel log-linear modeling method that scales to massive data applications with billions of training records and several million potential predictors in a map-reduce framework. Our method exploits correlations in aggregates observed at multiple resolutions when working with multiple hierarchies; stable estimates at coarser resolution provide informative prior information to improve estimates at finer resolutions. Other than prediction accuracy and scalability, our method has an inbuilt variable screening procedure based on a \"spike and slab prior\" that provides parsimony by removing non-informative predictors without hurting predictive accuracy. We perform large scale experiments on data from real computational advertising applications and illustrate our approach on datasets with several billion records and hundreds of millions of predictors. Extensive comparisons with other benchmark methods show significant improvements in prediction accuracy.","PeriodicalId":20529,"journal":{"name":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"33 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78339400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 81

Session details: Research track 17: social network analysis 议题17:社会网络分析

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2010-07-25 DOI: 10.1145/3248797

Brian D. Davison

引用次数: 0

Overlapping experiment infrastructure: more, better, faster experimentation 重叠实验基础设施:更多、更好、更快的实验

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2010-07-25 DOI: 10.1145/1835804.1835810

Diane Tang, Ashish Agarwal, Deirdre O'Brien, Mike Meyer

{"title":"Overlapping experiment infrastructure: more, better, faster experimentation","authors":"Diane Tang, Ashish Agarwal, Deirdre O'Brien, Mike Meyer","doi":"10.1145/1835804.1835810","DOIUrl":"https://doi.org/10.1145/1835804.1835810","url":null,"abstract":"At Google, experimentation is practically a mantra; we evaluate almost every change that potentially affects what our users experience. Such changes include not only obvious user-visible changes such as modifications to a user interface, but also more subtle changes such as different machine learning algorithms that might affect ranking or content selection. Our insatiable appetite for experimentation has led us to tackle the problems of how to run more experiments, how to run experiments that produce better decisions, and how to run them faster. In this paper, we describe Google's overlapping experiment infrastructure that is a key component to solving these problems. In addition, because an experiment infrastructure alone is insufficient, we also discuss the associated tools and educational processes required to use it effectively. We conclude by describing trends that show the success of this overall experimental environment. While the paper specifically describes the experiment system and experimental processes we have in place at Google, we believe they can be generalized and applied by any entity interested in using experimentation to improve search engines and other web applications.","PeriodicalId":20529,"journal":{"name":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85437256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 288

Optimizing debt collections using constrained reinforcement learning 利用约束强化学习优化债务回收

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2010-07-25 DOI: 10.1145/1835804.1835817

N. Abe, Prem Melville, Cezar Pendus, C. Reddy, David L. Jensen, V. P. Thomas, J. Bennett, Gary F. Anderson, Brent R. Cooley, Melissa Kowalczyk, Mark Domick, Timothy Gardinier

引用次数: 78

Mining positive and negative patterns for relevance feature discovery 挖掘积极和消极模式的相关特征发现

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2010-07-25 DOI: 10.1145/1835804.1835900

Yuefeng Li, Abdulmohsen Algarni, N. Zhong

{"title":"Mining positive and negative patterns for relevance feature discovery","authors":"Yuefeng Li, Abdulmohsen Algarni, N. Zhong","doi":"10.1145/1835804.1835900","DOIUrl":"https://doi.org/10.1145/1835804.1835900","url":null,"abstract":"It is a big challenge to guarantee the quality of discovered relevance features in text documents for describing user preferences because of the large number of terms, patterns, and noise. Most existing popular text mining and classification methods have adopted term-based approaches. However, they have all suffered from the problems of polysemy and synonymy. Over the years, people have often held the hypothesis that pattern-based methods should perform better than term-based ones in describing user preferences, but many experiments do not support this hypothesis. The innovative technique presented in paper makes a breakthrough for this difficulty. This technique discovers both positive and negative patterns in text documents as higher level features in order to accurately weight low-level features (terms) based on their specificity and their distributions in the higher level features. Substantial experiments using this technique on Reuters Corpus Volume 1 and TREC topics show that the proposed approach significantly outperforms both the state-of-the-art term-based methods underpinned by Okapi BM25, Rocchio or Support Vector Machine and pattern based methods on precision, recall and F measures.","PeriodicalId":20529,"journal":{"name":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"25 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89584633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 97