{"title":"Deriving User- and Content-specific Rewards for Contextual Bandits","authors":"Paolo Dragone, Rishabh Mehrotra, M. Lalmas","doi":"10.1145/3308558.3313592","DOIUrl":"https://doi.org/10.1145/3308558.3313592","url":null,"abstract":"Bandit algorithms have gained increased attention in recommender systems, as they provide effective and scalable recommendations. These algorithms use reward functions, usually based on a numeric variable such as click-through rates, as the basis for optimization. On a popular music streaming service, a contextual bandit algorithm is used to decide which content to recommend to users, where the reward function is a binarization of a numeric variable that defines success based on a static threshold of user streaming time: 1 if the user streamed for at least 30 seconds and 0 otherwise. We explore alternative methods to provide a more informed reward function, based on the assumptions that streaming time distribution heavily depends on the type of user and the type of content being streamed. To automatically extract user and content groups from streaming data, we employ ”co-clustering”, an unsupervised learning technique to simultaneously extract clusters of rows and columns from a co-occurrence matrix. The streaming distributions within the co-clusters are then used to define rewards specific to each co-cluster. Our proposed co-clustered based reward functions lead to improvement of over 25% in expected stream rate, compared to the standard binarized rewards.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"42 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77590749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning Binary Hash Codes for Fast Anchor Link Retrieval across Networks","authors":"Yongqing Wang, Huawei Shen, Jinhua Gao, Xueqi Cheng","doi":"10.1145/3308558.3313430","DOIUrl":"https://doi.org/10.1145/3308558.3313430","url":null,"abstract":"Users are usually involved in multiple social networks, without explicit anchor links that reveal the correspondence among different accounts of the same user across networks. Anchor link prediction aims to identify the hidden anchor links, which is a fundamental problem for user profiling, information cascading, and cross-domain recommendation. Although existing methods perform well in the accuracy of anchor link prediction, the pairwise search manners on inferring anchor links suffer from big challenge when being deployed in practical systems. To combat the challenges, in this paper we propose a novel embedding and matching architecture to directly learn binary hash code for each node. Hash codes offer us an efficient index to filter out the candidate node pairs for anchor link prediction. Extensive experiments on synthetic and real world large-scale datasets demonstrate that our proposed method has high time efficiency without loss of competitive prediction accuracy in anchor link prediction.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"69 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90390654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yiming Zhang, Yujie Fan, Wei Song, Shifu Hou, Yanfang Ye, X. Li, Liang Zhao, C. Shi, Jiabin Wang, Qi Xiong
{"title":"Your Style Your Identity: Leveraging Writing and Photography Styles for Drug Trafficker Identification in Darknet Markets over Attributed Heterogeneous Information Network","authors":"Yiming Zhang, Yujie Fan, Wei Song, Shifu Hou, Yanfang Ye, X. Li, Liang Zhao, C. Shi, Jiabin Wang, Qi Xiong","doi":"10.1145/3308558.3313537","DOIUrl":"https://doi.org/10.1145/3308558.3313537","url":null,"abstract":"Due to its anonymity, there has been a dramatic growth of underground drug markets hosted in the darknet (e.g., Dream Market and Valhalla). To combat drug trafficking (a.k.a. illicit drug trading) in the cyberspace, there is an urgent need for automatic analysis of participants in darknet markets. However, one of the key challenges is that drug traffickers (i.e., vendors) may maintain multiple accounts across different markets or within the same market. To address this issue, in this paper, we propose and develop an intelligent system named uStyle-uID leveraging both writing and photography styles for drug trafficker identification at the first attempt. At the core of uStyle-uID is an attributed heterogeneous information network (AHIN) which elegantly integrates both writing and photography styles along with the text and photo contents, as well as other supporting attributes (i.e., trafficker and drug information) and various kinds of relations. Built on the constructed AHIN, to efficiently measure the relatedness over nodes (i.e., traffickers) in the constructed AHIN, we propose a new network embedding model Vendor2Vec to learn the low-dimensional representations for the nodes in AHIN, which leverages complementary attribute information attached in the nodes to guide the meta-path based random walk for path instances sampling. After that, we devise a learning model named vIdentifier to classify if a given pair of traffickers are the same individual. Comprehensive experiments on the data collections from four different darknet markets are conducted to validate the effectiveness of uStyle-uID which integrates our proposed method in drug trafficker identification by comparisons with alternative approaches.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86016260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Leveraging Peer Communication to Enhance Crowdsourcing","authors":"Wei Tang, Ming Yin, Chien-Ju Ho","doi":"10.1145/3308558.3313554","DOIUrl":"https://doi.org/10.1145/3308558.3313554","url":null,"abstract":"Crowdsourcing has become a popular tool for large-scale data collection where it is often assumed that crowd workers complete the work independently. In this paper, we relax such independence property and explore the usage of peer communication-a kind of direct interactions between workers-in crowdsourcing. In particular, in the crowdsourcing setting with peer communication, a pair of workers are asked to complete the same task together by first generating their initial answers to the task independently and then freely discussing the task with each other and updating their answers after the discussion. We first experimentally examine the effects of peer communication on individual microtasks. Our results conducted on three types of tasks consistently suggest that work quality is significantly improved in tasks with peer communication compared to tasks where workers complete the work independently. We next explore how to utilize peer communication to optimize the requester's total utility while taking into account higher data correlation and higher cost introduced by peer communication. In particular, we model the requester's online decision problem of whether and when to use peer communication in crowdsourcing as a constrained Markov decision process which maximizes the requester's total utility under budget constraints. Our proposed approach is empirically shown to bring higher total utility compared to baseline approaches.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73588197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"What happened? The Spread of Fake News Publisher Content During the 2016 U.S. Presidential Election","authors":"Ceren Budak","doi":"10.1145/3308558.3313721","DOIUrl":"https://doi.org/10.1145/3308558.3313721","url":null,"abstract":"The spread of content produced by fake news publishers was one of the most discussed characteristics of the 2016 U.S. Presidential Election. Yet, little is known about the prevalence and focus of such content, how its prevalence changed over time, and how this prevalence related to important election dynamics. In this paper, we address these questions using tweets that mention the two presidential candidates sampled at the daily level, the news content mentioned in such tweets, and open-ended responses from nationally representative telephone interviews. The results of our analysis highlight various important lessons for news consumers and journalists. We find that (i.) traditional news producers outperformed fake news producers in aggregate, (ii.) the prevalence of content produced by fake news publishers increased over the course of the campaign-particularly among tweets that mentioned Clinton, and (iii.) changes in such prevalence were closely following changes in net Clinton favorability. Turning to content, we (iv.) identify similarities and differences in agenda setting by fake and traditional news media and show that (v.) information individuals most commonly reported to having read, seen or heard about the candidates was more closely aligned with content produced by fake news outlets than traditional news outlets, in particular for information Republican voters retained about Clinton. We also model fake-ness of retained information as a function of demographics characteristics. Implications for platform owners, news consumers, and journalists are discussed.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"223 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73207980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Agustin Zuniga, Huber Flores, Eemil Lagerspetz, P. Nurmi, S. Tarkoma, P. Hui, J. Manner
{"title":"Tortoise or Hare? Quantifying the Effects of Performance on Mobile App Retention","authors":"Agustin Zuniga, Huber Flores, Eemil Lagerspetz, P. Nurmi, S. Tarkoma, P. Hui, J. Manner","doi":"10.1145/3308558.3313428","DOIUrl":"https://doi.org/10.1145/3308558.3313428","url":null,"abstract":"We contribute by quantifying the effect of network latency and battery consumption on mobile app performance and retention, i.e., user's decisions to continue or stop using apps. We perform our analysis by fusing two large-scale crowdsensed datasets collected by piggybacking on information captured by mobile apps. We find that app performance has an impact in its retention rate. Our results demonstrate that high energy consumption and high latency decrease the likelihood of retaining an app. Conversely, we show that reducing latency or energy consumption does not guarantee higher likelihood of retention as long as they are within reasonable standards of performance. However, we also demonstrate that what is considered reasonable depends on what users have been accustomed to, with device and network characteristics, and app category playing a role. As our second contribution, we develop a model for predicting retention based on performance metrics. We demonstrate the benefits of our model through empirical benchmarks which show that our model not only predicts retention accurately, but generalizes well across application categories, locations and other factors moderating the effect of performance.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"57 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74505905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Large-Scale Talent Flow Forecast with Dynamic Latent Factor Model?","authors":"Le Zhang, Hengshu Zhu, Tong Xu, Chen Zhu, Chuan Qin, Hui Xiong, Enhong Chen","doi":"10.1145/3308558.3313525","DOIUrl":"https://doi.org/10.1145/3308558.3313525","url":null,"abstract":"The understanding of talent flow is critical for sharpening company talent strategy to keep competitiveness in the current fast-evolving environment. Existing studies on talent flow analysis generally rely on subjective surveys. However, without large-scale quantitative studies, there are limits to deliver fine-grained predictive business insights for better talent management. To this end, in this paper, we aim to introduce a big data-driven approach for predictive talent flow analysis. Specifically, we first construct a time-aware job transition tensor by mining the large-scale job transition records of digital resumes from online professional networks (OPNs), where each entry refers to a fine-grained talent flow rate of a specific job position between two companies. Then, we design a dynamic latent factor based Evolving Tensor Factorization (ETF) model for predicting the future talent flows. In particular, a novel evolving feature by jointly considering the influence of previous talent flows and global market is introduced for modeling the evolving nature of each company. Furthermore, to improve the predictive performance, we also integrate several representative attributes of companies as side information for regulating the model inference. Finally, we conduct extensive experiments on large-scale real-world data for evaluating the model performances. The experimental results clearly validate the effectiveness of our approach compared with state-of-the-art baselines in terms of talent flow forecast. Meanwhile, the results also reveal some interesting findings on the regularity of talent flows, e.g. Facebook becomes more and more attractive for the engineers from Google in 2016.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"103 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77014634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kijung Shin, A. Ghoting, Myunghwan Kim, Hema Raghavan
{"title":"SWeG: Lossless and Lossy Summarization of Web-Scale Graphs","authors":"Kijung Shin, A. Ghoting, Myunghwan Kim, Hema Raghavan","doi":"10.1145/3308558.3313402","DOIUrl":"https://doi.org/10.1145/3308558.3313402","url":null,"abstract":"Given a terabyte-scale graph distributed across multiple machines, how can we summarize it, with much fewer nodes and edges, so that we can restore the original graph exactly or within error bounds? As large-scale graphs are ubiquitous, ranging from web graphs to online social networks, compactly representing graphs becomes important to efficiently store and process them. Given a graph, graph summarization aims to find its compact representation consisting of (a) a summary graph where the nodes are disjoint sets of nodes in the input graph, and each edge indicates the edges between all pairs of nodes in the two sets; and (b) edge corrections for restoring the input graph from the summary graph exactly or within error bounds. Although graph summarization is a widely-used graph-compression technique readily combinable with other techniques, existing algorithms for graph summarization are not satisfactory in terms of speed or compactness of outputs. More importantly, they assume that the input graph is small enough to fit in main memory. In this work, we propose SWeG, a fast parallel algorithm for summarizing graphs with compact representations. SWeG is designed for not only shared-memory but also MapReduce settings to summarize graphs that are too large to fit in main memory. We demonstrate that SWeG is (a) Fast: SWeG is up to 5400 × faster than its competitors that give similarly compact representations, (b) Scalable: SWeG scales to graphs with tens of billions of edges, and (c) Compact: combined with state-of-the-art compression methods, SWeG achieves up to 3.4 × better compression than them.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"53 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78126246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jai Gupta, Zhen Qin, Michael Bendersky, Donald Metzler
{"title":"Personalized Online Spell Correction for Personal Search","authors":"Jai Gupta, Zhen Qin, Michael Bendersky, Donald Metzler","doi":"10.1145/3308558.3313706","DOIUrl":"https://doi.org/10.1145/3308558.3313706","url":null,"abstract":"Spell correction is a must-have feature for any modern search engine in applications such as web or e-commerce search. Typical spell correction solutions used in production systems consist of large indexed lookup tables based on a global model trained across many users over a large scale web corpus or a query log. For search over personal corpora, such as email, this global solution is not sufficient, as it ignores the user's personal lexicon. Without personalization, global spelling fails to correct tail queries drawn from a user's own, often idiosyncratic, lexicon. Personalization using existing algorithms is difficult due to resource constraints and unavailability of sufficient data to build per-user models. In this work, we propose a simple and effective personalized spell correction solution that augments existing global solutions for search over private corpora. Our event driven spell correction candidate generation method is specifically designed with personalization as the key construct. Our novel spell correction and query completion algorithms do not require complex model training and is highly efficient. The proposed solution has shown over 30% click-through rate gain on affected queries when evaluated against a range of strong commercial personal search baselines - Google's Gmail, Drive, and Calendar search production systems.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"60 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79867354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Local Matching Networks for Engineering Diagram Search","authors":"Zhuyun Dai, Zhen Fan, Hafeezul Rahman, Jamie Callan","doi":"10.1145/3308558.3313500","DOIUrl":"https://doi.org/10.1145/3308558.3313500","url":null,"abstract":"Finding diagrams that contain a specific part or a similar part is important in many engineering tasks. In this search task, the query part is expected to match only a small region in a complex image. This paper investigates several local matching networks that explicitly model local region-to-region similarities. Deep convolutional neural networks extract local features and model local matching patterns. Spatial convolution is employed to cross-match local regions at different scale levels, addressing cases where the target part appears at a different scale, position, and/or angle. A gating network automatically learns region importance, removing noise from sparse areas and visual metadata in engineering diagrams. Experimental results show that local matching approaches are more effective for engineering diagram search than global matching approaches. Suppressing unimportant regions via the gating network enhances accuracy. Matching across different scales via spatial convolution substantially improves robustness to scale and rotation changes. A pipelined architecture efficiently searches a large collection of diagrams by using a simple local matching network to identify a small set of candidate images and a more sophisticated network with convolutional cross-scale matching to re-rank candidates.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"57 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80160979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}