Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining最新文献_第9页

Representing documents through their readers 通过读者表示文档

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2013-08-11 DOI: 10.1145/2487575.2487596

Khalid El-Arini, Min Xu, E. Fox, Carlos Guestrin

{"title":"Representing documents through their readers","authors":"Khalid El-Arini, Min Xu, E. Fox, Carlos Guestrin","doi":"10.1145/2487575.2487596","DOIUrl":"https://doi.org/10.1145/2487575.2487596","url":null,"abstract":"From Twitter to Facebook to Reddit, users have become accustomed to sharing the articles they read with friends or followers on their social networks. While previous work has modeled what these shared stories say about the user who shares them, the converse question remains unexplored: what can we learn about an article from the identities of its likely readers? To address this question, we model the content of news articles and blog posts by attributes of the people who are likely to share them. For example, many Twitter users describe themselves in a short profile, labeling themselves with phrases such as \"vegetarian\" or \"liberal.\" By assuming that a user's labels correspond to topics in the articles he shares, we can learn a labeled dictionary from a training corpus of articles shared on Twitter. Thereafter, we can code any new document as a sparse non-negative linear combination of user labels, where we encourage correlated labels to appear together in the output via a structured sparsity penalty. Finally, we show that our approach yields a novel document representation that can be effectively used in many problem settings, from recommendation to modeling news dynamics. For example, while the top politics stories will change drastically from one month to the next, the \"politics\" label will still be there to describe them. We evaluate our model on millions of tweeted news articles and blog posts collected between September 2010 and September 2012, demonstrating that our approach is effective.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"85-86 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73234605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17

Forex-foreteller: currency trend modeling using news articles 外汇预测:使用新闻文章进行货币趋势建模

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2013-08-11 DOI: 10.1145/2487575.2487710

Fang Jin, Nathan Self, Parang Saraf, P. Butler, W. Wang, Naren Ramakrishnan

引用次数: 41

Optimizing parallel belief propagation in junction treesusing regression 基于回归的连接树并行信念传播优化

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2013-08-11 DOI: 10.1145/2487575.2487611

Lu Zheng, O. Mengshoel

引用次数: 19

Silence is also evidence: interpreting dwell time for recommendation from psychological perspective 沉默也是证据:从心理学角度解释推荐的停留时间

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2013-08-11 DOI: 10.1145/2487575.2487663

Peifeng Yin, Ping Luo, Wang-Chien Lee, Min Wang

{"title":"Silence is also evidence: interpreting dwell time for recommendation from psychological perspective","authors":"Peifeng Yin, Ping Luo, Wang-Chien Lee, Min Wang","doi":"10.1145/2487575.2487663","DOIUrl":"https://doi.org/10.1145/2487575.2487663","url":null,"abstract":"Social media is a platform for people to share and vote content. From the analysis of the social media data we found that users are quite inactive in rating/voting. For example, a user on average only votes 2 out of 100 accessed items. Traditional recommendation methods are mostly based on users' votes and thus can not cope with this situation. Based on the observation that the dwell time on an item may reflect the opinion of a user, we aim to enrich the user-vote matrix by converting the dwell time on items into users' ``pseudo votes'' and then help improve recommendation performance. However, it is challenging to correctly interpret the dwell time since many subjective human factors, e.g. user expectation, sensitivity to various item qualities, reading speed, are involved into the casual behavior of online reading. In psychology, it is assumed that people have choice threshold in decision making. The time spent on making decision reflects the decision maker's threshold. This idea inspires us to develop a View-Voting model, which can estimate how much the user likes the viewed item according to her dwell time, and thus make recommendations even if there is no voting data available. Finally, our experimental evaluation shows that the traditional rate-based recommendation's performance is greatly improved with the support of VV model.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"92 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73298605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 62

Gaussian multiple instance learning approach for mapping the slums of the world using very high resolution imagery 高斯多实例学习方法用于使用非常高分辨率的图像绘制世界贫民窟

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2013-08-11 DOI: 10.1145/2487575.2488210

Ranga Raju Vatsavai

引用次数: 36

Ad click prediction: a view from the trenches 广告点击预测:从战壕的角度来看

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2013-08-11 DOI: 10.1145/2487575.2488200

H. B. McMahan, Gary Holt, D. Sculley, Michael Young, D. Ebner, Julian Grady, Lan Nie, Todd Phillips, Eugene Davydov, D. Golovin, S. Chikkerur, Dan Liu, M. Wattenberg, A. M. Hrafnkelsson, T. Boulos, J. Kubica

{"title":"Ad click prediction: a view from the trenches","authors":"H. B. McMahan, Gary Holt, D. Sculley, Michael Young, D. Ebner, Julian Grady, Lan Nie, Todd Phillips, Eugene Davydov, D. Golovin, S. Chikkerur, Dan Liu, M. Wattenberg, A. M. Hrafnkelsson, T. Boulos, J. Kubica","doi":"10.1145/2487575.2488200","DOIUrl":"https://doi.org/10.1145/2487575.2488200","url":null,"abstract":"Predicting ad click-through rates (CTR) is a massive-scale learning problem that is central to the multi-billion dollar online advertising industry. We present a selection of case studies and topics drawn from recent experiments in the setting of a deployed CTR prediction system. These include improvements in the context of traditional supervised learning based on an FTRL-Proximal online learning algorithm (which has excellent sparsity and convergence properties) and the use of per-coordinate learning rates. We also explore some of the challenges that arise in a real-world system that may appear at first to be outside the domain of traditional machine learning research. These include useful tricks for memory savings, methods for assessing and visualizing performance, practical methods for providing confidence estimates for predicted probabilities, calibration methods, and methods for automated management of features. Finally, we also detail several directions that did not turn out to be beneficial for us, despite promising results elsewhere in the literature. The goal of this paper is to highlight the close relationship between theoretical advances and practical engineering in this industrial setting, and to show the depth of challenges that appear when applying traditional machine learning methods in a complex dynamic system.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"17 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74335317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 910

Selective sampling on graphs for classification 对图进行选择性抽样进行分类

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2013-08-11 DOI: 10.1145/2487575.2487641

Quanquan Gu, C. Aggarwal, Jialu Liu, Jiawei Han

{"title":"Selective sampling on graphs for classification","authors":"Quanquan Gu, C. Aggarwal, Jialu Liu, Jiawei Han","doi":"10.1145/2487575.2487641","DOIUrl":"https://doi.org/10.1145/2487575.2487641","url":null,"abstract":"Selective sampling is an active variant of online learning in which the learner is allowed to adaptively query the label of an observed example. The goal of selective sampling is to achieve a good trade-off between prediction performance and the number of queried labels. Existing selective sampling algorithms are designed for vector-based data. In this paper, motivated by the ubiquity of graph representations in real-world applications, we propose to study selective sampling on graphs. We first present an online version of the well-known Learning with Local and Global Consistency method (OLLGC). It is essentially a second-order online learning algorithm, and can be seen as an online ridge regression in the Hilbert space of functions defined on graphs. We prove its regret bound in terms of the structural property (cut size) of a graph. Based on OLLGC, we present a selective sampling algorithm, namely Selective Sampling with Local and Global Consistency (SSLGC), which queries the label of each node based on the confidence of the linear function on graphs. Its bound on the label complexity is also derived. We analyze the low-rank approximation of graph kernels, which enables the online algorithms scale to large graphs. Experiments on benchmark graph datasets show that OLLGC outperforms the state-of-the-art first-order algorithm significantly, and SSLGC achieves comparable or even better results than OLLGC while querying substantially fewer nodes. Moreover, SSLGC is overwhelmingly better than random sampling.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"7 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78647750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 31

On community detection in real-world networks and the importance of degree assortativity 现实世界网络中的社区检测及度配性的重要性

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2013-08-11 DOI: 10.1145/2487575.2487666

M. Ciglan, M. Laclavik, K. Nørvåg

引用次数: 36

To buy or not to buy: that is the question 买还是不买，这是个问题

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2013-08-11 DOI: 10.1145/2487575.2491129

Oren Etzioni

引用次数: 0

Experience from hosting a corporate prediction market: benefits beyond the forecasts 主持企业预测市场的经验:超出预测的好处

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2013-08-11 DOI: 10.1145/2487575.2488212

T. A. Montgomery, Paul M. Stieg, Michael J. Cavaretta, P. Moraal

{"title":"Experience from hosting a corporate prediction market: benefits beyond the forecasts","authors":"T. A. Montgomery, Paul M. Stieg, Michael J. Cavaretta, P. Moraal","doi":"10.1145/2487575.2488212","DOIUrl":"https://doi.org/10.1145/2487575.2488212","url":null,"abstract":"Prediction markets are virtual stock markets used to gain insight and forecast events by leveraging the wisdom of crowds. Popularly applied in the public to cultural questions (election results, box-office returns), they have recently been applied by corporations to leverage employee knowledge and forecast answers to business questions (sales volumes, products and features, release timing). Determining whether to run a prediction market requires practical experience that is rarely described. Over the last few years, Ford Motor Company obtained practical experience by deploying one of the largest corporate prediction markets known. Business partners in the US, Europe, and South America provided questions on new vehicle features, sales volumes, take rates, pricing, and macroeconomic trends. We describe our experience, including both the strong and weak correlations found between predictions and real world results. Evaluating this methodology goes beyond prediction accuracy, however, since there are many side benefits. In addition to the predictions, we discuss the value of comments, stock price changes over time, the ability to overcome bureaucratic limits, and flexibly filling holes in corporate knowledge, enabling better decision making. We conclude with advice on running prediction markets, including writing good questions, market duration, motivating traders and protecting confidential information.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"21 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76021300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7