{"title":"Conversations, Machine Learning and Privacy: LinkedIn's Path Towards Transforming Interaction with Its Members","authors":"I. Perisic","doi":"10.1145/3159652.3160600","DOIUrl":"https://doi.org/10.1145/3159652.3160600","url":null,"abstract":"At LinkedIn, we believe that having the right conversations with our members is key to unlocking economic opportunity for them. For us, these conversations are in a broader context than traditionally defined dialogues. A typical dialogue usually only considers a limited time-window as context and is trying to satisfy an immediate intent. Advanced dialogue systems allow an user to take a number of turns, in that short-time window, to get clear on the user's intent. However, our members are having conversations with us over long periods of time about their long-term goals, such as staying informed, growing a professional network, advancing a career, getting a job, finding qualified leads, etc. These conversational goals are often hierarchical. For example, getting a great job is a key part of advancing your career. Our goal at LinkedIn is to be able to have simultaneous conversations with our members on all of these levels. To do this, we have to build machine learning systems that understand that there are multiple multi-level conversations going on. We have made strong headway in building components of this conversational vision by learning how to approximate long-term member value and defining an optimization framework that can incorporate multiple conflicting objectives. These problems consider the states of these conversations when interacting with our members and actively make decisions that optimize this ongoing dialogue. We have a challenging and interesting road ahead. In this talk, Igor will present the current state of LinkedIn's machine-learning efforts towards building robust, long-term conversational systems. He will then discuss the potential privacy and ethical issues surrounding having these conversational interactions through an ever-increasing number of touchpoints with our members.","PeriodicalId":401247,"journal":{"name":"Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining","volume":"9 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132056969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Difallah, Elena Filatova, Panagiotis G. Ipeirotis
{"title":"Demographics and Dynamics of Mechanical Turk Workers","authors":"D. Difallah, Elena Filatova, Panagiotis G. Ipeirotis","doi":"10.1145/3159652.3159661","DOIUrl":"https://doi.org/10.1145/3159652.3159661","url":null,"abstract":"We present an analysis of the population dynamics and demographics of Amazon Mechanical Turk workers based on the results of the survey that we conducted over a period of 28 months, with more than 85K responses from 40K unique participants. The demographics survey is ongoing (as of November 2017), and the results are available at http://demographics.mturk-tracker.com: we provide an API for researchers to download the survey data. We use techniques from the field of ecology, in particular, the capture-recapture technique, to understand the size and dynamics of the underlying population. We also demonstrate how to model and account for the inherent selection biases in such surveys. Our results indicate that there are more than 100K workers available in Amazon»s crowdsourcing platform, the participation of the workers in the platform follows a heavy-tailed distribution, and at any given time there are more than 2K active workers. We also show that the half-life of a worker on the platform is around 12-18 months and that the rate of arrival of new workers balances the rate of departures, keeping the overall worker population relatively stable. Finally, we demonstrate how we can estimate the biases of different demographics to participate in the survey tasks, and show how to correct such biases. Our methodology is generic and can be applied to any platform where we are interested in understanding the dynamics and demographics of the underlying user population.","PeriodicalId":401247,"journal":{"name":"Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122770203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Srijan Kumar, Bryan Hooi, Disha Makhija, Mohit Kumar, C. Faloutsos, V. S. Subrahmanian
{"title":"REV2: Fraudulent User Prediction in Rating Platforms","authors":"Srijan Kumar, Bryan Hooi, Disha Makhija, Mohit Kumar, C. Faloutsos, V. S. Subrahmanian","doi":"10.1145/3159652.3159729","DOIUrl":"https://doi.org/10.1145/3159652.3159729","url":null,"abstract":"Rating platforms enable large-scale collection of user opinion about items(e.g., products or other users). However, untrustworthy users give fraudulent ratings for excessive monetary gains. In this paper, we present REV2, a system to identify such fraudulent users. We propose three interdependent intrinsic quality metrics---fairness of a user, reliability of a rating and goodness of a product. The fairness and reliability quantify the trustworthiness of a user and rating, respectively, and goodness quantifies the quality of a product. Intuitively, a user is fair if it provides reliable scores that are close to the goodness of products. We propose six axioms to establish the interdependency between the scores, and then, formulate a mutually recursive definition that satisfies these axioms. We extend the formulation to address cold start problem and incorporate behavior properties. We develop the REV2 algorithm to calculate these intrinsic quality scores for all users, ratings, and products. We show that this algorithm is guaranteed to converge and has linear time complexity. By conducting extensive experiments on five rating datasets, we show that REV2 outperforms nine existing algorithms in detecting fair and unfair users. We reported the 150 most unfair users in the Flipkart network to their review fraud investigators, and 127 users were identified as being fraudulent(84.6% accuracy). The REV2 algorithm is being deployed at Flipkart.","PeriodicalId":401247,"journal":{"name":"Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining","volume":"298 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114486290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Curriculum Learning for Heterogeneous Star Network Embedding via Deep Reinforcement Learning","authors":"Meng Qu, Jian Tang, Jiawei Han","doi":"10.1145/3159652.3159711","DOIUrl":"https://doi.org/10.1145/3159652.3159711","url":null,"abstract":"Learning node representations for networks has attracted much attention recently due to its effectiveness in a variety of applications. This paper focuses on learning node representations for heterogeneous star networks, which have a center node type linked with multiple attribute node types through different types of edges. In heterogeneous star networks, we observe that the training order of different types of edges affects the learning performance significantly. Therefore we study learning curricula for node representation learning in heterogeneous star networks, i.e., learning an optimal sequence of edges of different types for the node representation learning process. We formulate the problem as a Markov decision process, with the action as selecting a specific type of edges for learning or terminating the training process, and the state as the sequence of edge types selected so far. The reward is calculated as the performance on external tasks with node representations as features, and the goal is to take a series of actions to maximize the cumulative rewards. We propose an approach based on deep reinforcement learning for this problem. Our approach leverages LSTM models to encode states and further estimate the expected cumulative reward of each state-action pair, which essentially measures the long-term performance of different actions at each state. Experimental results on real-world heterogeneous star networks demonstrate the effectiveness and efficiency of our approach over competitive baseline approaches.","PeriodicalId":401247,"journal":{"name":"Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining","volume":"2675 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114866963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"From Search to Research: Direct Answers, Perspectives and Dialog","authors":"Harry Shum","doi":"10.1145/3159652.3160599","DOIUrl":"https://doi.org/10.1145/3159652.3160599","url":null,"abstract":"Advances in artificial intelligence have improved machine understanding of speech, images, and natural language. This in turn has allowed us to greatly enhance the intelligence of products such as Bing and Cortana. This keynote describes our continuing journey beyond keyword-driven systems, into dialog and intelligent agent functionality, helping our users \"research more, search less\". Modern systems attempt to provide concise direct answers, which can fit on a small screen or become a spoken response. To find such answers, Microsoft can draw from a uniquely broad inventory of data sources such as the Bing Web & Knowledge graphs, the workplace graph of Office 365, and the Microsoft Academic Graph. Since these graphs contain a lot of text information, we apply machine reading and comprehension technology to extract concise answers. Microsoft has entries frequently topping the leaderboards in the community»s machine reading contests. To select the right answers, we use deep multi-task learning to develop a vector representation that is usable across multiple data sources and scenarios. This is combined with a large-scale data processing and serving infrastructure. We use this not only to find a single answer, but also to find multiple answers in cases where multiple valid perspectives exist. In the case of numeric answers, we provide some context to help users understand what the numbers mean. This is part of our effort to consider not just IQ but EQ in our conversational systems, where the chatbot Xiaoice leads the way in establishing a human connection, to develop long and sustained conversations. These advances improve product quality, enable new user experiences and have challenged us to rethink the entire intelligent search platform at Microsoft.","PeriodicalId":401247,"journal":{"name":"Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining","volume":"180 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121616151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust Transfer Learning for Cross-domain Collaborative Filtering Using Multiple Rating Patterns Approximation","authors":"Ming He, Jiuling Zhang, Peng Yang, K. Yao","doi":"10.1145/3159652.3159675","DOIUrl":"https://doi.org/10.1145/3159652.3159675","url":null,"abstract":"Collaborative filtering techniques are a common approach for building recommendations, and have been widely applied in real recommender systems. However, collaborative filtering usually suffers from limited performance due to the sparsity of user-item interaction. To address this issue, auxiliary information is usually used to improve the performance. Transfer learning provides the key idea of using knowledge from auxiliary domains. An assumption of transfer learning in collaborative filtering is that the source domain is a full rating matrix, which may not hold in many real-world applications. In this paper, we investigate how to leverage rating patterns from multiple incomplete source domains to improve the quality of recommender systems. First, by exploiting the transferred learning, we compress the knowledge from the source domain into a cluster-level rating matrix. The rating patterns in the low-level matrix can be transferred to the target domain. Specifically, we design a knowledge extraction method to enrich rating patterns by relaxing the full rating restriction on the source domain. Finally, we propose a robust multiple-rating-pattern transfer learning model for cross-domain collaborative filtering, which is called MINDTL, to accurately predict missing values in the target domain. Extensive experiments on real-world datasets demonstrate that our proposed approach is effective and outperforms several alternative methods.","PeriodicalId":401247,"journal":{"name":"Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125049962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Stergiou, Dipen Rughwani, Kostas Tsioutsiouliklis
{"title":"Shortcutting Label Propagation for Distributed Connected Components","authors":"S. Stergiou, Dipen Rughwani, Kostas Tsioutsiouliklis","doi":"10.1145/3159652.3159696","DOIUrl":"https://doi.org/10.1145/3159652.3159696","url":null,"abstract":"Connected Components is a fundamental graph mining problem that has been studied for the PRAM, MapReduce and BSP models. We present a simple CC algorithm for BSP that does not mutate the graph, converges in O(log n) supersteps and scales to graphs of trillions of edges.","PeriodicalId":401247,"journal":{"name":"Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120964006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yao Ma, Z. Ren, Ziheng Jiang, Jiliang Tang, Dawei Yin
{"title":"Multi-Dimensional Network Embedding with Hierarchical Structure","authors":"Yao Ma, Z. Ren, Ziheng Jiang, Jiliang Tang, Dawei Yin","doi":"10.1145/3159652.3159680","DOIUrl":"https://doi.org/10.1145/3159652.3159680","url":null,"abstract":"Information networks are ubiquitous in many applications. A popular way to facilitate the information in a network is to embed the network structure into low-dimension spaces where each node is represented as a vector. The learned representations have been proven to advance various network analysis tasks such as link prediction and node classification. The majority of existing embedding algorithms are designed for the networks with one type of nodes and one dimension of relations among nodes. However, many networks in the real-world complex systems have multiple types of nodes and multiple dimensions of relations. For example, an e-commerce network can have users and items, and items can be viewed or purchased by users, corresponding to two dimensions of relations. In addition, some types of nodes can present hierarchical structure. For example, authors in publication networks are associated to affiliations; and items in e-commerce networks belong to categories. Most of existing methods cannot be naturally applicable to these networks. In this paper, we aim to learn representations for networks with multiple dimensions and hierarchical structure. In particular, we provide an approach to capture independent information from each dimension and dependent information across dimensions and propose a framework MINES, which performs Multi-dImension Network Embedding with hierarchical Structure. Experimental results on a network from a real-world e-commerce website demonstrate the effectiveness of the proposed framework.","PeriodicalId":401247,"journal":{"name":"Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125830658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Connectivity in Complex Networks: Measures, Inference and Optimization","authors":"Chen Chen","doi":"10.1145/3159652.3170460","DOIUrl":"https://doi.org/10.1145/3159652.3170460","url":null,"abstract":"Networks are ubiquitous in many high impact domains. Among the various aspects of network studies, connectivity is the one that plays important role in many applications (e.g., information dissemination, robustness analysis, community detection, etc.). The diversified applications have spurred numerous connectivity measures. Accordingly, ad-hoc connectivity optimization methods are designed for each measure, making it hard to model and control the connectivity of the network in a uniformed framework. On the other hand, it is often impossible to maintain an accurate structure of the network due to network dynamics and noise in real applications, which would affect the accuracy of connectivity measures and the effectiveness of corresponding connectivity optimization methods. In this work, we aim to address the challenges on network connectivity by (1)unifying a wide range of classic network connectivity measures into one uniform model; (2)proposing effective approaches to infer connectivity measures and network structures from dynamic and incomplete input data, and (3) providing a general framework to optimize the connectivity measures in the network.","PeriodicalId":401247,"journal":{"name":"Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126051250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Personalized Top-N Sequential Recommendation via Convolutional Sequence Embedding","authors":"Jiaxi Tang, Ke Wang","doi":"10.1145/3159652.3159656","DOIUrl":"https://doi.org/10.1145/3159652.3159656","url":null,"abstract":"Top-N sequential recommendation models each user as a sequence of items interacted in the past and aims to predict top-N ranked items that a user will likely interact in a »near future». The order of interaction implies that sequential patterns play an important role where more recent items in a sequence have a larger impact on the next item. In this paper, we propose a Convolutional Sequence Embedding Recommendation Model »Caser» as a solution to address this requirement. The idea is to embed a sequence of recent items into an »image» in the time and latent spaces and learn sequential patterns as local features of the image using convolutional filters. This approach provides a unified and flexible network structure for capturing both general preferences and sequential patterns. The experiments on public data sets demonstrated that Caser consistently outperforms state-of-the-art sequential recommendation methods on a variety of common evaluation metrics.","PeriodicalId":401247,"journal":{"name":"Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123170568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}