{"title":"Improving IP Geolocation using Query Logs","authors":"Ovidiu Dan, Vaibhav Parikh, Brian D. Davison","doi":"10.1145/2835776.2835820","DOIUrl":"https://doi.org/10.1145/2835776.2835820","url":null,"abstract":"IP geolocation databases map IP addresses to their geographical locations. These databases are important for several applications such as local search engine relevance, credit card fraud protection, geotargetted advertising, and online content delivery. While they are the most popular method of geolocation, they can have low accuracy at the city level. In this paper we evaluate and improve IP geolocation databases using data collected from search engine logs. We generate a large ground-truth dataset using real time global positioning data extracted from search engine logs. We show that incorrect geolocation information can have a negative impact on implicit user metrics. Using the dataset we measure the accuracy of three state-of-the-art commercial IP geolocation databases. We then introduce a technique to improve existing geolocation databases by mining explicit locations from query logs. We show significant accuracy gains in 44 to 49 out of the top 50 countries, depending on the IP geolocation database. Finally, we validate the approach with a large scale A/B experiment that shows improvements in several user metrics.","PeriodicalId":20567,"journal":{"name":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","volume":"29 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74434306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Charles, Nikhil R. Devanur, Balasubramanian Sivan
{"title":"Multi-Score Position Auctions","authors":"D. Charles, Nikhil R. Devanur, Balasubramanian Sivan","doi":"10.1145/2835776.2835822","DOIUrl":"https://doi.org/10.1145/2835776.2835822","url":null,"abstract":"In this paper we propose a general family of position auctions used in paid search, which we call multi-score position auctions. These auctions contain the GSP auction and the GSP auction with squashing as special cases. We show experimentally that these auctions contain special cases that perform better than the GSP auction with squashing, in terms of revenue, and the number of clicks on ads. In particular, we study in detail the special case that squashes the first slot alone and show that this beats pure squashing (which squashes all slots uniformly). We study the equilibria that arise in this special case to examine both the first order and the second order effect of moving from the squashing-all-slots auction to the squash-only-the-top-slot auction. For studying the second order effect, we simulate auctions using the value-relevance correlated distribution suggested in Lahaie and Pennock [2007]. Since this distribution is derived from a study of value and relevance distributions in Yahoo! we believe the insights derived from this simulation to be valuable. For measuring the first order effect, in addition to the said simulation, we also conduct experiments using auction data from Bing over several weeks that includes a random sample of all auctions.","PeriodicalId":20567,"journal":{"name":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","volume":"31 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77608774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Observing Users","authors":"M. Lalmas","doi":"10.1145/3253875","DOIUrl":"https://doi.org/10.1145/3253875","url":null,"abstract":"","PeriodicalId":20567,"journal":{"name":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","volume":"49 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79768571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Understanding Diffusion Processes: Inference and Theory","authors":"Xinran He","doi":"10.1145/2835776.2855084","DOIUrl":"https://doi.org/10.1145/2835776.2855084","url":null,"abstract":"With increasing popularity of social media and social networks sites, analyzing the social networks offers great potential to shed light on human social structure and provides great marketing opportunities. Usually, social network analysis starts with extracting or learning the social network and the associated parameters. Contrary to other analytical tasks, this step is highly non-trivial due to amorphous nature of social ties and the challenges of noisy and incomplete observations. My research focuses on improving accuracy in inferring the network as well as analyzing the consequences when the extracted network is noisy or erroneous. To be more precise, I propose to study the following two questions with a special focus on analyzing diffusion behaviors: (1) How to utilize special properties of social networks to improve accuracy of the extracted network under noisy and missing data; (2) How to characterize the impact of noise in the inferred network and carry out robust analysis and optimization. Usually the first step towards social influence analysis is to infer the diffusion network. Assuming a probabilistic model of influence and a model of how the timing of individuals’ adoption decisions correlates, one can use these data to estimate the strengths of influence between pairs of individuals. However, existing approaches for Network Inference rely on the common assumption that the observations used to train the models are complete, while missing observations are commonplace in practice due to time or technical limitations in data collection. Therefore, I propose to study the impact of incomplete observations and design efficient method to compensate for noise or incompleteness in observed data. I propose to exploit the fact that social networks have more specific structure than arbitrary graphs. A joint estimation of the graph generation model and the actual network structure is likely to significantly improve the estimation accuracy. Moreover, incorporating the content information of the cascade also has potential to improve the inference accuracy. Therefore, I propose to combine the Correlated Topic Model [1] and Hawkes Process [5, 4, 6] into a unified model to utilize content information [2]. Due to noise or missing data in the observations, even in the best case, one would expect that the inferred network structure and link strengths will only be an approximation to the truth; in other words, noise in the data will be pervasive for inferred social networks. I propose to focus on the algorithmic question of Influence Maximization [3] in the context of noisy social network data. More specifically, I propose to consider the following questions: Given an instance of an Influence Model, with level of mis-estimation: (1) Decide whether the objective function on this instance varies smoothly with perturbations to the parameters. (2) If the dependence is smooth, how to find a robustly nearoptimal solution.","PeriodicalId":20567,"journal":{"name":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","volume":"02 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88843001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DiFacto: Distributed Factorization Machines","authors":"Mu Li, Ziqi Liu, Alex Smola, Yu-Xiang Wang","doi":"10.1145/2835776.2835781","DOIUrl":"https://doi.org/10.1145/2835776.2835781","url":null,"abstract":"Factorization Machines offer good performance and useful embeddings of data. However, they are costly to scale to large amounts of data and large numbers of features. In this paper we describe DiFacto, which uses a refined Factorization Machine model with sparse memory adaptive constraints and frequency adaptive regularization. We show how to distribute DiFacto over multiple machines using the Parameter Server framework by computing distributed subgradients on minibatches asynchronously. We analyze its convergence and demonstrate its efficiency in computational advertising datasets with billions examples and features.","PeriodicalId":20567,"journal":{"name":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90274411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Affective Computing of Image Emotion Perceptions","authors":"Sicheng Zhao","doi":"10.1145/2835776.2855082","DOIUrl":"https://doi.org/10.1145/2835776.2855082","url":null,"abstract":"Images can convey rich semantics and evoke strong emotions in viewers. The research of my PhD thesis focuses on image emotion computing (IEC), which aims to predict the emotion perceptions of given images. The development of IEC is greatly constrained by two main challenges: affective gap and subjective evaluation [5]. Previous works mainly focused on finding features that can express emotions better to bridge the affective gap, such as elements-of-art based features [2] and shape features [1]. Based on the emotion representation models, including categorical emotion states (CES) and dimensional emotion space (DES) [5], three different tasks are traditionally performed on IEC: affective image classification, regression and retrieval. The state-of-the-art methods on the three above tasks are image-centric, focusing on the dominant emotions for the majority of viewers. For my PhD thesis, I plan to answer the following questions: 1. Compared to the low-level elements-of-art based features, can we find some higher level features that are more interpretable and have stronger link to emotions? 2. Are the emotions that are evoked in viewers by an image subjective and different? If they are, how can we tackle the user-centric emotion prediction? 3. For imagecentric emotion computing, can we predict the emotion distribution instead of the dominant emotion category? 1. The artistic elements must be carefully arranged and orchestrated into meaningful regions and images to describe specific semantics and emotions. The rules, tools or guidelines of arranging and orchestrating the elements-of-art in an artwork are known as the principles-of-art, which consider various artistic aspects, including balance, emphasis, harmony, variety, gradation, movement, rhythm, and proportion [5]. We systematically study and formulize the former 6 artistic principles, explaining the concepts and translating these concepts into mathematical formulae. 2. The images in Abstract dataset [2] were labeled by 14 people on average. 81% images are assigned with 5 to 8 emotions. So the perceived emotions of different viewers may vary. To further demonstrate this observation, we set up a large-scale dataset, named Image-Emotion-Social-Net dataset, with over 1 million images downloaded from Flickr. To get the personalized emotion labels, firstly we use traditional lexicon-based methods as in [4] to obtain the text segmentation results of the title, tags and descrip-","PeriodicalId":20567,"journal":{"name":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","volume":"63 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84875117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Barbara Made the News: Mining the Behavior of Crowds for Time-Aware Learning to Rank","authors":"Flávio Martins, João Magalhães, Jamie Callan","doi":"10.1145/2835776.2835825","DOIUrl":"https://doi.org/10.1145/2835776.2835825","url":null,"abstract":"In Twitter, and other microblogging services, the generation of new content by the crowd is often biased towards immediacy: what is happening now. Prompted by the propagation of commentary and information through multiple mediums, users on the Web interact with and produce new posts about newsworthy topics and give rise to trending topics. This paper proposes to leverage on the behavioral dynamics of users to estimate the most relevant time periods for a topic. Our hypothesis stems from the fact that when a real-world event occurs it usually has peak times on the Web: a higher volume of tweets, new visits and edits to related Wikipedia articles, and news published about the event. In this paper, we propose a novel time-aware ranking model that leverages on multiple sources of crowd signals. Our approach builds on two major novelties. First, a unifying approach that given query q, mines and represents temporal evidence from multiple sources of crowd signals. This allows us to predict the temporal relevance of documents for query q. Second, a principled retrieval model that integrates temporal signals in a learning to rank framework, to rank results according to the predicted temporal relevance. Evaluation on the TREC 2013 and 2014 Microblog track datasets demonstrates that the proposed model achieves a relative improvement of 13.2% over lexical retrieval models and 6.2% over a learning to rank baseline.","PeriodicalId":20567,"journal":{"name":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","volume":"38 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75870424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dotan Di Castro, L. Lewin-Eytan, Y. Maarek, R. Wolff, Eyal Zohar
{"title":"Enforcing k-anonymity in Web Mail Auditing","authors":"Dotan Di Castro, L. Lewin-Eytan, Y. Maarek, R. Wolff, Eyal Zohar","doi":"10.1145/2835776.2835803","DOIUrl":"https://doi.org/10.1145/2835776.2835803","url":null,"abstract":"We study the problem of k-anonymization of mail messages in the realistic scenario of auditing mail traffic in a major commercial Web mail service. Mail auditing is necessary in various Web mail debugging and quality assurance activities, such as anti-spam or the qualitative evaluation of novel mail features. It is conducted by trained professionals, often referred to as \"auditors\", who are shown messages that could expose personally identifiable information. We address here the challenge of k-anonymizing such messages, focusing on machine generated mail messages that represent more than 90% of today's mail traffic. We introduce a novel message signature Mail-Hash, specifically tailored to identifying structurally-similar messages, which allows us to put such messages in a same equivalence class. We then define a process that generates, for each class, masked mail samples that can be shown to auditors, while guaranteeing the k-anonymity of users. The productivity of auditors is measured by the amount of non-hidden mail content they can see every day, while considering normal working conditions, which set a limit to the number of mail samples they can review. In addition, we consider k-anonymity over time since, by definition of k-anonymity, every new release places additional constraints on the assignment of samples. We describe in details the results we obtained over actual Yahoo mail traffic, and thus demonstrate that our methods are feasible at Web mail scale. Given the constantly growing concern of users over their email being scanned by others, we argue that it is critical to devise such algorithms that guarantee k-anonymity, and implement associated processes in order to restore the trust of mail users.","PeriodicalId":20567,"journal":{"name":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","volume":"21 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89134745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Takuya Konishi, Takuya Ohwa, Sumio Fujita, K. Ikeda, K. Hayashi
{"title":"Extracting Search Query Patterns via the Pairwise Coupled Topic Model","authors":"Takuya Konishi, Takuya Ohwa, Sumio Fujita, K. Ikeda, K. Hayashi","doi":"10.1145/2835776.2835794","DOIUrl":"https://doi.org/10.1145/2835776.2835794","url":null,"abstract":"A fundamental yet new challenge in information retrieval is the identification of patterns behind search queries. For example, the query \"NY restaurant\" and \"boston hotel\" shares the common pattern \"LOCATION SERVICE\". However, because of the diversity of real queries, existing approaches require data preprocessing by humans or specifying the target query domains, which hinders their applicability. We propose a probabilistic topic model that assumes that each term (e.g., \"NY\") has a topic (LOCATION). The key idea is that we consider topic co-occurrence in a query rather than a topic sequence, which significantly reduces computational cost yet enables us to acquire coherent topics without the preprocessing. Using two real query datasets, we demonstrate that the obtained topics are intelligible by humans, and are highly accurate in keyword prediction and query generation tasks.","PeriodicalId":20567,"journal":{"name":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","volume":"23 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75412347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Understanding and Identifying Advocates for Political Campaigns on Social Media","authors":"Suhas Ranganath, Xia Hu, Jiliang Tang, Huan Liu","doi":"10.1145/2835776.2835807","DOIUrl":"https://doi.org/10.1145/2835776.2835807","url":null,"abstract":"Social media is increasingly being used to access and disseminate information on sociopolitical issues like gun rights and general elections. The popularity and openness of social media makes it conducive for some individuals, known as advocates, who use social media to push their agendas on these issues strategically. Identifying these advocates will caution social media users before reading their information and also enable campaign managers to identify advocates for their digital political campaigns. A significant challenge in identifying advocates is that they employ nuanced strategies to shape user opinion and increase the spread of their messages, making it difficult to distinguish them from random users posting on the campaign. In this paper, we draw from social movement theories and design a quantitative framework to study the nuanced message strategies, propagation strategies, and community structure adopted by advocates for political campaigns in social media. Based on observations of their social media activities manifesting from these strategies, we investigate how to model these strategies for identifying them. We evaluate the framework using two datasets from Twitter, and our experiments demonstrate its effectiveness in identifying advocates for political campaigns with ramifications of this work directed towards assisting users as they navigate through social media spaces.","PeriodicalId":20567,"journal":{"name":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","volume":"31 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78868141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}