{"title":"Exploiting Human Mobility Patterns for Point-of-Interest Recommendation","authors":"Zijun Yao","doi":"10.1145/3159652.3170459","DOIUrl":"https://doi.org/10.1145/3159652.3170459","url":null,"abstract":"Point-of-interest (POI) recommendation, which provides personalized recommendation of places to mobile users, is an important task in location-based social networks (LBSNs). Unlike traditional interest-oriented merchandise recommendation, POI recommendation is more complex due to the timing effects: we need to examine whether the POI fits a user»s availability. While there are some prior studies which consider temporal effects by solely using check-in timestamps for modeling, they suffer from check-in data sparsity. Recent years, the advent in positioning technology has accumulated a variety of urban data related to human mobility. There is a potential to exploit human mobility patterns from heterogeneous information sources for improving POI recommendation. To this end, we propose a novel method which incorporates the degree of temporal matching between users and POIs into personalized POI recommendations. Specifically, we profile the temporal popularity of POIs, learn the latent regularity to characterize users, and conduct comprehensive experiments with real-world data. Evaluation results demonstrate the effectiveness of the proposed method.","PeriodicalId":401247,"journal":{"name":"Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116896391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alexandre Gilotte, Clément Calauzènes, Thomas Nedelec, A. Abraham, Simon Dollé
{"title":"Offline A/B Testing for Recommender Systems","authors":"Alexandre Gilotte, Clément Calauzènes, Thomas Nedelec, A. Abraham, Simon Dollé","doi":"10.1145/3159652.3159687","DOIUrl":"https://doi.org/10.1145/3159652.3159687","url":null,"abstract":"Online A/B testing evaluates the impact of a new technology by running it in a real production environment and testing its performance on a subset of the users of the platform. It is a well-known practice to run a preliminary offline evaluation on historical data to iterate faster on new ideas, and to detect poor policies in order to avoid losing money or breaking the system. For such offline evaluations, we are interested in methods that can compute offline an estimate of the potential uplift of performance generated by a new technology. Offline performance can be measured using estimators known as counterfactual or off-policy estimators. Traditional counterfactual estimators, such as capped importance sampling or normalised importance sampling, exhibit unsatisfying bias-variance compromises when experimenting on personalized product recommendation systems. To overcome this issue, we model the bias incurred by these estimators rather than bound it in the worst case, which leads us to propose a new counterfactual estimator. We provide a benchmark of the different estimators showing their correlation with business metrics observed by running online A/B tests on a large-scale commercial recommender system.","PeriodicalId":401247,"journal":{"name":"Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121136215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Can you Trust the Trend?: Discovering Simpson's Paradoxes in Social Data","authors":"N. Alipourfard, Peter G. Fennell, Kristina Lerman","doi":"10.1145/3159652.3159684","DOIUrl":"https://doi.org/10.1145/3159652.3159684","url":null,"abstract":"We investigate how Simpson»s paradox affects analysis of trends in social data. According to the paradox, the trends observed in data that has been aggregated over an entire population may be different from, and even opposite to, those of the underlying subgroups. Failure to take this effect into account can lead analysis to wrong conclusions. We present a statistical method to automatically identify Simpson»s paradox in data by comparing statistical trends in the aggregate data to those in the disaggregated subgroups. We apply the approach to data from Stack Exchange, a popular question-answering platform, to analyze factors affecting answerer performance, specifically, the likelihood that an answer written by a user will be accepted by the asker as the best answer to his or her question. Our analysis confirms a known Simpson»s paradox and identifies several new instances. These paradoxes provide novel insights into user behavior on Stack Exchange.","PeriodicalId":401247,"journal":{"name":"Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130828882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Theoretical Impediments to Machine Learning With Seven Sparks from the Causal Revolution","authors":"J. Pearl","doi":"10.1145/3159652.3176182","DOIUrl":"https://doi.org/10.1145/3159652.3176182","url":null,"abstract":"Current machine learning systems operate, almost exclusively, in a statistical, or model-blind mode, which entails severe theoretical limits on their power and performance. Such systems cannot reason about interventions and retrospection and, therefore, cannot serve as the basis for strong AI. To achieve human level intelligence, learning machines need the guidance of a model of reality, similar to the ones used in causal inference. To demonstrate the essential role of such models, I will present a summary of seven tasks which are beyond reach of current machine learning systems and which have been accomplished using the tools of causal inference.","PeriodicalId":401247,"journal":{"name":"Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123561930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tom Kenter, Alexey Borisov, Christophe Van Gysel, Mostafa Dehghani, M. de Rijke, Bhaskar Mitra
{"title":"Neural Networks for Information Retrieval","authors":"Tom Kenter, Alexey Borisov, Christophe Van Gysel, Mostafa Dehghani, M. de Rijke, Bhaskar Mitra","doi":"10.1145/3159652.3162009","DOIUrl":"https://doi.org/10.1145/3159652.3162009","url":null,"abstract":"Machine learning plays a role in many aspects of modern IR systems, and deep learning is applied in all of them. The fast pace of modern-day research has given rise to many approaches to many IR problems. The amount of information available can be overwhelming both for junior students and for experienced researchers looking for new research topics and directions. The aim of this full- day tutorial is to give a clear overview of current tried-and-trusted neural methods in IR and how they benefit IR.","PeriodicalId":401247,"journal":{"name":"Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121921288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Ballpark Crowdsourcing: The Wisdom of Rough Group Comparisons","authors":"Tom Hope, Dafna Shahaf","doi":"10.1145/3159652.3159670","DOIUrl":"https://doi.org/10.1145/3159652.3159670","url":null,"abstract":"Crowdsourcing has become a popular method for collecting labeled training data. However, in many practical scenarios traditional labeling can be difficult for crowdworkers(for example, if the data is high-dimensional or unintuitive, or the labels are continuous). In this work, we develop a novel model for crowdsourcing that can complement standard practices by exploiting people»s intuitions about groups and relations between them. We employ a recent machine learning setting, called Ballpark Learning, that can estimate individual labels given only coarse, aggregated signal over groups of data points. To address the important case of continuous labels, we extend the Ballpark setting(which focused on classification) to regression problems. We formulate the problem as a convex optimization problem and propose fast, simple methods with an innate robustness to outliers. We evaluate our methods on real-world datasets, demonstrating how useful constraints about groups can be harnessed from a crowd of non-experts. Our methods can rival supervised models trained on many true labels, and can obtain considerably better results from the crowd than a standard label-collection process(for a lower price). By collecting rough guesses on groups of instances and using machine learning to infer the individual labels, our lightweight framework is able to address core crowdsourcing challenges and train machine learning models in a cost-effective way.","PeriodicalId":401247,"journal":{"name":"Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134029482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ziniu Hu, Weiqing Liu, Jiang Bian, Xuanzhe Liu, Tie-Yan Liu
{"title":"Listening to Chaotic Whispers: A Deep Learning Framework for News-oriented Stock Trend Prediction","authors":"Ziniu Hu, Weiqing Liu, Jiang Bian, Xuanzhe Liu, Tie-Yan Liu","doi":"10.1145/3159652.3159690","DOIUrl":"https://doi.org/10.1145/3159652.3159690","url":null,"abstract":"Stock trend prediction plays a critical role in seeking maximized profit from the stock investment. However, precise trend prediction is very difficult since the highly volatile and non-stationary nature of the stock market. Exploding information on the Internet together with the advancing development of natural language processing and text mining techniques have enabled investors to unveil market trends and volatility from online content. Unfortunately, the quality, trustworthiness, and comprehensiveness of online content related to stock market vary drastically, and a large portion consists of the low-quality news, comments, or even rumors. To address this challenge, we imitate the learning process of human beings facing such chaotic online news, driven by three principles: sequential content dependency, diverse influence, and effective and efficient learning. In this paper, to capture the first two principles, we designed a Hybrid Attention Networks(HAN) to predict the stock trend based on the sequence of recent related news. Moreover, we apply the self-paced learning mechanism to imitate the third principle. Extensive experiments on real-world stock market data demonstrate the effectiveness of our framework. A further simulation illustrates that a straightforward trading strategy based on our proposed framework can significantly increase the annualized return.","PeriodicalId":401247,"journal":{"name":"Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129300634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hongwei Wang, Fuzheng Zhang, Min Hou, Xing Xie, M. Guo, Qi Liu
{"title":"SHINE: Signed Heterogeneous Information Network Embedding for Sentiment Link Prediction","authors":"Hongwei Wang, Fuzheng Zhang, Min Hou, Xing Xie, M. Guo, Qi Liu","doi":"10.1145/3159652.3159666","DOIUrl":"https://doi.org/10.1145/3159652.3159666","url":null,"abstract":"In online social networks people often express attitudes towards others, which forms massive sentiment links among users. Predicting the sign of sentiment links is a fundamental task in many areas such as personal advertising and public opinion analysis. Previous works mainly focus on textual sentiment classification, however, text information can only disclose the \"tip of the iceberg»» about users» true opinions, of which the most are unobserved but implied by other sources of information such as social relation and users» profile. To address this problem, in this paper we investigate how to predict possibly existing sentiment links in the presence of heterogeneous information. First, due to the lack of explicit sentiment links in mainstream social networks, we establish a labeled heterogeneous sentiment dataset which consists of users» sentiment relation, social relation and profile knowledge by entity-level sentiment extraction method. Then we propose a novel and flexible end-to-end Signed Heterogeneous Information Network Embedding (SHINE) framework to extract users» latent representations from heterogeneous networks and predict the sign of unobserved sentiment links. SHINE utilizes multiple deep autoencoders to map each user into a low-dimension feature space while preserving the network structure. We demonstrate the superiority of SHINE over state-of-the-art baselines on link prediction and node recommendation in two real-world datasets. The experimental results also prove the efficacy of SHINE in cold start scenario.","PeriodicalId":401247,"journal":{"name":"Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122883161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Preethi Lahoti, Venkata Rama Kiran Garimella, A. Gionis
{"title":"Joint Non-negative Matrix Factorization for Learning Ideological Leaning on Twitter","authors":"Preethi Lahoti, Venkata Rama Kiran Garimella, A. Gionis","doi":"10.1145/3159652.3159669","DOIUrl":"https://doi.org/10.1145/3159652.3159669","url":null,"abstract":"People are shifting from traditional news sources to online news at an incredibly fast rate. However, the technology behind online news consumption promotes content that confirms the users» existing point of view. This phenomenon has led to polarization of opinions and intolerance towards opposing views. Thus, a key problem is to model information filter bubbles on social media and design methods to eliminate them. In this paper, we use a machine-learning approach to learn a liberal-conservative ideology space on Twitter, and show how we can use the learned latent space to tackle the filter bubble problem. We model the problem of learning the liberal-conservative ideology space of social media users and media sources as a constrained non-negative matrix-factorization problem. Our model incorporates the social-network structure and content-consumption information in a joint factorization problem with shared latent factors. We validate our model and solution on a real-world Twitter dataset consisting of controversial topics, and show that we are able to separate users by ideology with over 90% purity. When applied to media sources, our approach estimates ideology scores that are highly correlated(Pearson correlation 0.9) with ground-truth ideology scores. Finally, we demonstrate the utility of our model in real-world scenarios, by illustrating how the learned ideology latent space can be used to develop exploratory and interactive interfaces that can help users in diffusing their information filter bubble.","PeriodicalId":401247,"journal":{"name":"Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123020783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jooyeon Kim, Behzad Tabibian, Alice H. Oh, B. Scholkopf, M. Gomez-Rodriguez
{"title":"Leveraging the Crowd to Detect and Reduce the Spread of Fake News and Misinformation","authors":"Jooyeon Kim, Behzad Tabibian, Alice H. Oh, B. Scholkopf, M. Gomez-Rodriguez","doi":"10.1145/3159652.3159734","DOIUrl":"https://doi.org/10.1145/3159652.3159734","url":null,"abstract":"Online social networking sites are experimenting with the following crowd-powered procedure to reduce the spread of fake news and misinformation: whenever a user is exposed to a story through her feed, she can flag the story as misinformation and, if the story receives enough flags, it is sent to a trusted third party for fact checking. If this party identifies the story as misinformation, it is marked as disputed. However, given the uncertain number of exposures, the high cost of fact checking, and the trade-off between flags and exposures, the above mentioned procedure requires careful reasoning and smart algorithms which, to the best of our knowledge, do not exist to date. In this paper, we first introduce a flexible representation of the above procedure using the framework of marked temporal point processes. Then, we develop a scalable online algorithm, CURB, to select which stories to send for fact checking and when to do so to efficiently reduce the spread of misinformation with provable guarantees. In doing so, we need to solve a novel stochastic optimal control problem for stochastic differential equations with jumps, which is of independent interest. Experiments on two real-world datasets gathered from Twitter and Weibo show that our algorithm may be able to effectively reduce the spread of fake news and misinformation.","PeriodicalId":401247,"journal":{"name":"Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130378203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}