{"title":"Predicting Multi-step Citywide Passenger Demands Using Attention-based Neural Networks","authors":"Xian Zhou, Yanyan Shen, Yanmin Zhu, Linpeng Huang","doi":"10.1145/3159652.3159682","DOIUrl":"https://doi.org/10.1145/3159652.3159682","url":null,"abstract":"Predicting passenger pickup/dropoff demands based on historical mobility trips has been of great importance towards better vehicle distribution for the emerging mobility-on-demand (MOD) services. Prior works focused on predicting next-step passenger demands at selected locations or hotspots. However, we argue that multi-step citywide passenger demands encapsulate both time-varying demand trends and global statuses, and hence are more beneficial to avoiding demand-service mismatching and developing effective vehicle distribution/scheduling strategies. In this paper, we propose an end-to-end deep neural network solution to the prediction task. We employ the encoder-decoder framework based on convolutional and ConvLSTM units to identify complex features that capture spatiotemporal influences and pickup-dropoff interactions on citywide passenger demands. A novel attention model is incorporated to emphasize the effects of latent citywide mobility regularities. We evaluate our proposed method using real-word mobility trips (taxis and bikes) and the experimental results show that our method achieves higher prediction accuracy than the adaptations of the state-of-the-art approaches.","PeriodicalId":401247,"journal":{"name":"Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117006982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Athlytics: Winning in Sports with Data","authors":"K. Pelechrinis, E. Papalexakis","doi":"10.1145/3159652.3162005","DOIUrl":"https://doi.org/10.1145/3159652.3162005","url":null,"abstract":"Data and analytics have been part of the sports industry from as early as the 1870s, when the first boxscore in baseball was recorded. However, it is only recently that advanced data mining and machine learning techniques have been utilized for facilitating the operations of sports franchises. While part of the reason is related with the ability to collect more fine-grained data, an equally important factor for this turn to analytics is the huge success and competitive advantage that early adopters of investment in analytics enjoyed(popularized by the best-seller -Moneyball? that described the success that Oakland Athletics had with analytics). Draft selection, game-day decision making and player evaluation are just a few of the applications where sports analytics play a crucial role today. Apart from the sports clubs, other stakeholders in the industry(e.g., the leagues' offices, media, etc.) invest in analytics. The leagues increasingly rely on data in order to decide on potential rule changes. For instance, the most recent rule change in NFL, i.e., the kickoff touchback, was a result of thorough data analysis of concussion instances. In this tutorial we will review the literature in data mining and machine learning techniques for sports analytics. We will introduce the audience to the design and methodologies behind advanced metrics such as the adjusted plus/minus for evaluating basketball players, spatial metrics for evaluating the ability of a player to spread the defense in basketball, and the Player Efficiency Rating(PER). We will also go in depth in advanced data mining methods, and in particular tensor mining, that can analyze heterogenous data similar to the ones available in today's sports world.","PeriodicalId":401247,"journal":{"name":"Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128235147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Review-Aware Answer Prediction for Product-Related Questions Incorporating Aspects","authors":"Qian Yu, Wai Lam","doi":"10.1145/3159652.3159718","DOIUrl":"https://doi.org/10.1145/3159652.3159718","url":null,"abstract":"In E-commerce sites, there are platforms for users to pose product-related questions and experienced customers may provide answers voluntarily. Among the questions asked by users, a large proportion of them are yes-no questions reflecting that users wish to know whether or not the product can satisfy a certain criterion or meet a certain expectation. Both Question Answering (QA) approaches and Community Question Answering methods are not suitable for answer prediction for new questions in this setting. The reasons are that questions are product-associated and many of them are concerned about user experiences and subjective opinions. In addition to existing question-answer pairs, user written reviews can provide useful clues for answer prediction. In this paper, we propose a new framework that can tackle the task of review-aware answer prediction for product-related questions. The aspect analytics model in this framework learns latent aspects as well as aspect-specific embeddings of reviews via a 3-order Autoencoder. One advantage of this learned model is that it can generate aspect-specific representations for new questions. The predictive answer model in our framework, learned jointly from existing questions, answers, and reviews, is able to predict the answers for new yes-no questions taking into consideration of aspects. Besides, our framework can provide supportive reviews grouped by relevant aspects serving as information for explainable answers. Experiment results on 15 different product categories from a large-scale benchmark E-commence QA dataset demonstrate the effectiveness of our framework.","PeriodicalId":401247,"journal":{"name":"Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129133658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Call to Arms: Embrace Assistive AI Systems!","authors":"A. Broder","doi":"10.1145/3159652.3160603","DOIUrl":"https://doi.org/10.1145/3159652.3160603","url":null,"abstract":"A quarter-century ago Web search stormed the world: within a few years the Web search box became a standard tool of daily life ready to satisfy informational, transactional, and navigational queries needed for some task completion. However, two recent trends are dramatically changing the box»s role: first, the explosive spread of smartphones brings significant computational resources literally into the pockets of billions of users; second, recent technological advances in machine learning and artificial intelligence, and in particular in speech processing led to the wide deployment of assistive AI systems, culminating in personal digital assistants. Along the way, the \"Web search box\" has become an \"assistance request box\" (implicit, in the case of voice-activated assistants) and likewise, many other information processing systems (e.g. e-mail, navigation, personal search, etc) have adopted assistive aspects. Formally, the assistive systems can be viewed as a selection process within a base set of alternatives driven by some user input. The output is either one alternative or a smaller set of alternatives, maybe subject to future selection. Hence, classic IR is a particular instance of this formulation, where the input is a textual query and the selection process is relevance ranking over the corpus. In increasing order of selection capabilities, assistive systems can be classified into three categories: Subordinate : systems where the selection is fully specified by the request; if this results in a singleton the system provides it, otherwise the system provides a random alternative from the result set. Therefore, the challenge for subordinate systems consists only in the correct interpretation of the user request (e.g., weather information, simple personal schedule management, a \"play jazz\" request). Conducive : systems that reduce the set of alternatives to a smaller set, possibly via an interactive process (e.g. the classic ten blue links, the three \"smart replies\" in Gmail, interactive recommendations, etc). Decisive : systems that make all necessary decisions to reach the desired goal (in other words, select a single alternative from the set of possibilities) including resolving ambiguities and other substantive decisions without further input from the user (e.g., typical translation systems, self-driving cars). The main goal of this talk is to examine these developments and to urge the WSDM community to increase its focus on assistive AI solutions that are becoming pertinent to a wide variety of information processing problems. I will mostly present ideas and work in progress, and there will be many more open questions than definitive answers.","PeriodicalId":401247,"journal":{"name":"Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123379743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic Ranking of Information Retrieval Systems","authors":"Maram Hasanain","doi":"10.1145/3159652.3170458","DOIUrl":"https://doi.org/10.1145/3159652.3170458","url":null,"abstract":"Typical information retrieval system evaluation requires expensive manually-collected relevance judgments of documents, which are used to rank retrieval systems. Due to the high cost associated with collecting relevance judgments and the ever-growing scale of data to be searched in practice, ranking of retrieval systems using manual judgments is becoming less feasible. Methods to automatically rank systems in absence of judgments have been proposed to tackle this challenge. However, current techniques are still far from reaching the ranking achieved using manual judgments. I propose to advance research on automatic system ranking using supervised and unsupervised techniques.","PeriodicalId":401247,"journal":{"name":"Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126320993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploring Expert Cognition for Attributed Network Embedding","authors":"Xiao Huang, Qingquan Song, Jundong Li, Xia Hu","doi":"10.1145/3159652.3159655","DOIUrl":"https://doi.org/10.1145/3159652.3159655","url":null,"abstract":"Attributed network embedding has been widely used in modeling real-world systems. The obtained low-dimensional vector representations of nodes preserve their proximity in terms of both network topology and node attributes, upon which different analysis algorithms can be applied. Recent advances in explanation-based learning and human-in-the-loop models show that by involving experts, the performance of many learning tasks can be enhanced. It is because experts have a better cognition in the latent information such as domain knowledge, conventions, and hidden relations. It motivates us to employ experts to transform their meaningful cognition into concrete data to advance network embedding. However, learning and incorporating the expert cognition into the embedding remains a challenging task. Because expert cognition does not have a concrete form, and is difficult to be measured and laborious to obtain. Also, in a real-world network, there are various types of expert cognition such as the comprehension of word meaning and the discernment of similar nodes. It is nontrivial to identify the types that could lead to a significant improvement in the embedding. In this paper, we study a novel problem of exploring expert cognition for attributed network embedding and propose a principled framework NEEC. We formulate the process of learning expert cognition as a task of asking experts a number of concise and general queries. Guided by the exemplar theory and prototype theory in cognitive science, the queries are systematically selected and can be generalized to various real-world networks. The returned answers from the experts contain their valuable cognition. We model them as new edges and directly add into the attributed network, upon which different embedding methods can be applied towards a more informative embedding representation. Experiments on real-world datasets verify the effectiveness and efficiency of NEEC.","PeriodicalId":401247,"journal":{"name":"Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining","volume":"171 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127644009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ning Su, Jiyin He, Yiqun Liu, Min Zhang, Shaoping Ma
{"title":"User Intent, Behaviour, and Perceived Satisfaction in Product Search","authors":"Ning Su, Jiyin He, Yiqun Liu, Min Zhang, Shaoping Ma","doi":"10.1145/3159652.3159714","DOIUrl":"https://doi.org/10.1145/3159652.3159714","url":null,"abstract":"As online shopping becomes increasingly popular, users perform more product search to purchase items. Previous studies have investigated people's online shopping behaviours and ways to predict online purchases. However, from a user perspective, there still lacks an in-depth understanding of why users search, how they interact with, and perceive the product search results. In this paper, we conduct both a user study and a log analysis to we address the following three questions: (1) what are the intents of users underlying their search activities? (2) do users behave differently under different search intents? and (3) how does user perceived satisfaction relate to their search behaviour as well as search intents, and can we predict product search satisfaction with interaction signals? Based on an online survey and search logs collected from a major commercial product search engine, we show that user intents in product search fall into three categories: Target Finding (TF), Decision Making (DM) and Exploration (EP). Through a log analysis and a user study, we observe different user interaction patterns as well as perceived satisfaction under these three intents. Using a series of user interaction features, we demonstrate that we can effectively predict user satisfaction, especially for TF and DM intents.","PeriodicalId":401247,"journal":{"name":"Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132838955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Samaneh Ebrahimi, H. Vahabi, Matthew Prockup, Oriol Nieto
{"title":"Predicting Audio Advertisement Quality","authors":"Samaneh Ebrahimi, H. Vahabi, Matthew Prockup, Oriol Nieto","doi":"10.1145/3159652.3159701","DOIUrl":"https://doi.org/10.1145/3159652.3159701","url":null,"abstract":"Online audio advertising is a particular form of advertising used abundantly in online music streaming services. In these platforms, which tend to host tens of thousands of unique audio advertisements (ads), providing high quality ads ensures a better user experience and results in longer user engagement. Therefore, the automatic assessment of these ads is an important step toward audio ads ranking and better audio ads creation. In this paper we propose one way to measure the quality of the audio ads using a proxy metric called Long Click Rate (LCR), which is defined by the amount of time a user engages with the follow-up display ad (that is shown while the audio ad is playing) divided by the impressions. We later focus on predicting the audio ad quality using only acoustic features such as harmony, rhythm, and timbre of the audio, extracted from the raw waveform. We discuss how the characteristics of the sound can be connected to concepts such as the clarity of the audio ad message, its trustworthiness, etc. Finally, we propose a new deep learning model for audio ad quality prediction, which outperforms the other discussed models trained on hand-crafted features. To the best of our knowledge, this is the first large-scale audio ad quality prediction study.","PeriodicalId":401247,"journal":{"name":"Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114786921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
G. Farnadi, Jie Tang, M. D. Cock, Marie-Francine Moens
{"title":"User Profiling through Deep Multimodal Fusion","authors":"G. Farnadi, Jie Tang, M. D. Cock, Marie-Francine Moens","doi":"10.1145/3159652.3159691","DOIUrl":"https://doi.org/10.1145/3159652.3159691","url":null,"abstract":"User profiling in social media has gained a lot of attention due to its varied set of applications in advertising, marketing, recruiting, and law enforcement. Among the various techniques for user modeling, there is fairly limited work on how to merge multiple sources or modalities of user data - such as text, images, and relations - to arrive at more accurate user profiles. In this paper, we propose a deep learning approach that extracts and fuses information across different modalities. Our hybrid user profiling framework utilizes a shared representation between modalities to integrate three sources of data at the feature level, and combines the decision of separate networks that operate on each combination of data sources at the decision level. Our experimental results on more than 5K Facebook users demonstrate that our approach outperforms competing approaches for inferring age, gender and personality traits of social media users. We get highly accurate results with AUC values of more than 0.9 for the task of age prediction and 0.95 for the task of gender prediction.","PeriodicalId":401247,"journal":{"name":"Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121996324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DSANLS: Accelerating Distributed Nonnegative Matrix Factorization via Sketching","authors":"Yuqiu Qian, Conghui Tan, N. Mamoulis, D. Cheung","doi":"10.1145/3159652.3159662","DOIUrl":"https://doi.org/10.1145/3159652.3159662","url":null,"abstract":"Nonnegative matrix factorization (NMF) has been successfully applied in different fields, such as text mining, image processing, and video analysis. NMF is the problem of determining two nonnegative low rank matrices U and V, for a given input matrix M, such that m ≈ UV⊥. There is an increasing interest in parallel and distributed NMF algorithms, due to the high cost of centralized NMF on large matrices. In this paper, we propose a distributed sketched alternating nonnegative least squares(DSANLS) framework for NMF, which utilizes a matrix sketching technique to reduce the size of nonnegative least squares subproblems in each iteration for U and V. We design and analyze two different random matrix generation techniques and two subproblem solvers. Our theoretical analysis shows that DSANLS converges to the stationary point of the original NMF problem and it greatly reduces the computational cost in each subproblem as well as the communication cost within the cluster. DSANLS is implemented using MPI for communication, and tested on both dense and sparse real datasets. The results demonstrate the efficiency and scalability of our framework, compared to the state-of-art distributed NMF MPI implementation.","PeriodicalId":401247,"journal":{"name":"Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining","volume":"14 9","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113944580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}