{"title":"Terms, Topics & Tasks: Enhanced User Modelling for Better Personalization","authors":"Rishabh Mehrotra, Emine Yilmaz","doi":"10.1145/2808194.2809467","DOIUrl":"https://doi.org/10.1145/2808194.2809467","url":null,"abstract":"Given the distinct preferences of different users while using search engines, search personalization has become an important problem in information retrieval. Most approaches to search personalization are based on identifying topics a user may be interested in and personalizing search results based on this information. While topical interests information of users can be highly valuable in personalizing search results and improving user experience, it ignores the fact that two different users that have similar topical interests may still be interested in achieving very different tasks with respect to this topic (e.g. the type of tasks a broker is likely to perform related to finance is likely to be very different than that of a regular investor). Hence, considering user's topical interests jointly with the type of tasks they are likely to be interested in could result in better personalised We present an approach that uses search task information embedded in search logs to represent users by their actions over a task-space as well as over their topical-interest space. In particular, we describe a tensor based approach that represents each user in terms of (i) user's topical interests and (ii) user's search task behaviours in a coupled fashion and use these representations for personalization. Additionally, we also integrate user's historic search behavior in a coupled matrix-tensor factorization framework to learn user representations. Through extensive evaluation via query recommendations and user cohort analysis, we demonstrate the value of considering topic specific task information while developing user models.","PeriodicalId":440325,"journal":{"name":"Proceedings of the 2015 International Conference on The Theory of Information Retrieval","volume":"132 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124266955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards Less Biased Web Search","authors":"Xitong Liu, Hui Fang, Deng Cai","doi":"10.1145/2808194.2809476","DOIUrl":"https://doi.org/10.1145/2808194.2809476","url":null,"abstract":"Web search engines now serve as essential assistant to help users make decisions in different aspects. Delivering correct and impartial information is a crucial functionality for search engines as any false information may lead to unwise decision and thus undesirable consequences. Unfortunately, a recent study revealed that Web search engines tend to provide biased information with most results supporting users' beliefs conveyed in queries regardless of the truth. In this paper we propose to alleviate bias in Web search through predicting the topical polarity of documents, which is the overall tendency of one document regarding whether it supports or disapproves the belief in query. By applying the prediction to balance search results, users would receive less biased information and therefore make wiser decision. To achieve this goal, we propose a novel textual segment extraction method to distill and generate document feature representation, and leverage convolution neural network, an effective deep learning approach, to predict topical polarity of documents. We conduct extensive experiments on a set of queries with medical indents and demonstrate that our model performs empirically well on identifying topical polarity with satisfying accuracy. To our best knowledge, our work is the first on investigating the mitigation of bias in Web search and could provide directions on future research.","PeriodicalId":440325,"journal":{"name":"Proceedings of the 2015 International Conference on The Theory of Information Retrieval","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126417341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Kelly, Jaime Arguello, A. Edwards, Wan-Ching Wu
{"title":"Development and Evaluation of Search Tasks for IIR Experiments using a Cognitive Complexity Framework","authors":"D. Kelly, Jaime Arguello, A. Edwards, Wan-Ching Wu","doi":"10.1145/2808194.2809465","DOIUrl":"https://doi.org/10.1145/2808194.2809465","url":null,"abstract":"One of the most challenging aspects of designing interactive information retrieval (IIR) experiments with users is the development of search tasks. We describe an evaluation of 20 search tasks that were designed for use in IIR experiments and developed using a cognitive complexity framework from educational theory. The search tasks represent five levels of cognitive complexity and four topical domains. The tasks were evaluated in the context of a laboratory IIR experiment with 48 participants. Behavioral and self-report data were used to characterize and understand differences among tasks. Results showed more cognitively complex tasks required significantly more search activity from participants (e.g., more queries, clicks, and time to complete). However, participants did not evaluate more cognitively complex tasks as more difficult and were equally satisfied with their performances across tasks. Our work makes four contributions: (1) it adds to what is known about the relationship among task, search behaviors and user experience; (2) it presents a framework for task creation and evaluation; (3) it provides tasks and questionnaires that can be reused by others and (4) it raises questions about findings and assumptions of many recent studies that only use behavioral signals from search logs as evidence for task difficulty and searcher satisfaction, as many of our results directly contradict these findings.","PeriodicalId":440325,"journal":{"name":"Proceedings of the 2015 International Conference on The Theory of Information Retrieval","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132136604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ariana S. Minot, Andrew Heier, Davis E. King, O. Simek, N. Stanisha
{"title":"Searching for Twitter Posts by Location","authors":"Ariana S. Minot, Andrew Heier, Davis E. King, O. Simek, N. Stanisha","doi":"10.1145/2808194.2809480","DOIUrl":"https://doi.org/10.1145/2808194.2809480","url":null,"abstract":"The microblogging service Twitter is an increasingly popular platform for sharing information worldwide. This motivates the potential to mine information from Twitter, which can serve as a valuable resource for applications such as event localization and location-specific recommendation systems. Geolocation of Twitter messages is integral to such applications. However, only a a small percentage of Twitter posts are accompanied by a GPS location. Recent works have begun exploring ways to estimate the unknown location of Twitter users based on the content of their posts and various available metadata. This presents interesting challenges for natural language processing and multi-objective optimization. We propose a new method for estimating the home location of users based on both the content of their posts and their social connections on Twitter. Our method achieves an accuracy of 77% within 10 km in exchange for a reduction in coverage of 76% with respect to techniques which only use social connections.","PeriodicalId":440325,"journal":{"name":"Proceedings of the 2015 International Conference on The Theory of Information Retrieval","volume":"212 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131580324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sabir Ribas, B. Ribeiro-Neto, Rodrygo L. T. Santos, E. D. S. E. Silva, A. Ueda, N. Ziviani
{"title":"Random Walks on the Reputation Graph","authors":"Sabir Ribas, B. Ribeiro-Neto, Rodrygo L. T. Santos, E. D. S. E. Silva, A. Ueda, N. Ziviani","doi":"10.1145/2808194.2809462","DOIUrl":"https://doi.org/10.1145/2808194.2809462","url":null,"abstract":"The identification of reputable entities is an important task in business, education, and many other fields. On the other hand, as an arguably subjective, multi-faceted concept, quantifying reputation is challenging. In this paper, instead of relying on a single, precise definition of reputation, we propose to exploit the transference of reputation among entities in order to identify the most reputable ones. To this end, we propose a novel random walk model to infer the reputation of a target set of entities with respect to suitable sources of reputation. We instantiate our model in an academic search setting, by modeling research groups as reputation sources and publication venues as reputation targets. By relying on publishing behavior as a reputation signal, we demonstrate the effectiveness of our model in contrast to standard citation-based approaches for identifying reputable venues as well as researchers in the broad area of computer science. In addition, we demonstrate the robustness of our model to perturbations in the selection of reputation sources. Finally, we show that effective reputation sources can be chosen via the proposed model itself in a semi-automatic fashion.","PeriodicalId":440325,"journal":{"name":"Proceedings of the 2015 International Conference on The Theory of Information Retrieval","volume":"132 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132592551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Probability Ranking Principle is Not Optimal in Adversarial Retrieval Settings","authors":"R. Ben-Basat, Moshe Tennenholtz, Oren Kurland","doi":"10.1145/2808194.2809456","DOIUrl":"https://doi.org/10.1145/2808194.2809456","url":null,"abstract":"The probability ranking principle (PRP) - ranking documents in response to a query by their relevance probabilities - is the theoretical foundation of most ad hoc document retrieval methods. A key observation that motivates our work is that the PRP does not account for potential post-ranking effects, specifically, changes to documents that result from a given ranking. Yet, in adversarial retrieval settings such as the Web, authors may consistently try to promote their documents in rankings by changing them. We prove that, indeed, the PRP can be sub-optimal in adversarial retrieval settings. We do so by presenting a novel game theoretic analysis of the adversarial setting. The analysis is performed for different types of documents (single topic and multi topic) and is based on different assumptions about the writing qualities of documents' authors. We show that in some cases, introducing randomization into the document ranking function yields overall user utility that transcends that of applying the PRP.","PeriodicalId":440325,"journal":{"name":"Proceedings of the 2015 International Conference on The Theory of Information Retrieval","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117294890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning to Reinforce Search Effectiveness","authors":"Jiyun Luo, Xuchu Dong, G. Yang","doi":"10.1145/2808194.2809468","DOIUrl":"https://doi.org/10.1145/2808194.2809468","url":null,"abstract":"Session search is an Information Retrieval (IR) task which handles a series of queries issued for a search task. In this paper, we propose a novel reinforcement learning style information retrieval framework and develop a new feedback learning algorithm to model user feedback, including clicks and query reformulations, as reinforcement signals and to generate rewards in the RL framework. From a new perspective, we view session search as a cooperative game played between two agents, the user and the search engine. We study the communications between the two agents; they always exchange opinions on \"whether the current stage of search is relevant\" and \"whether we should explore now.\" The algorithm infers user feedback models by an EM algorithm from the query logs. We compare to several state-of-the-art session search algorithms and evaluate our algorithm on the most recent TREC 2012 to 2014 Session Tracks. The experimental results demonstrates that our approach is highly effective for improving session search accuracy.","PeriodicalId":440325,"journal":{"name":"Proceedings of the 2015 International Conference on The Theory of Information Retrieval","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116260157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On Microblog Dimensionality and Informativeness: Exploiting Microblogs' Structure and Dimensions for Ad-Hoc Retrieval","authors":"Jesus A. Rodriguez Perez, J. Jose","doi":"10.1145/2808194.2809466","DOIUrl":"https://doi.org/10.1145/2808194.2809466","url":null,"abstract":"In recent years, microblog services such as Twitter have gained increasing popularity, leading to active research on how to effectively exploit its content. Microblog documents such as tweets differ in morphology with respect to more traditional documents such as web pages. Particularly, tweets are considerably shorter (140 characters) than web documents and contain contextual tags regarding the topic (hashtags), intended audience (mentions) of the document as well as links to external content(URLs). Traditional and state of the art retrieval models perform rather poorly in capturing the relevance of tweets, since they have been designed under very different conditions. In this work, we define a microblog document as a high-dimensional entity and study the structural differences between those documents deemed relevant and those non-relevant. Secondly we experiment with enhancing the behaviour of the best observed performing retrieval model by means of a re-ranking approach that accounts for the relative differences in these dimensions amongst tweets. Additionally we study the interactions between the different dimensions in terms of their order within the documents by modelling relevant and non-relevant tweets as state machines. These state machines are then utilised to produce scores which in turn are used for re-ranking. Our evaluation results show statistically significant improvements over the baseline in terms of precision at different cut-off points for both approaches. These results confirm that the relative presence of the different dimensions within a document and their ordering are connected with the relevance of microblogs.","PeriodicalId":440325,"journal":{"name":"Proceedings of the 2015 International Conference on The Theory of Information Retrieval","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124673189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Initial Analytical Exploration of Retrievability","authors":"Aldo Lipani, M. Lupu, Akiko Aizawa, A. Hanbury","doi":"10.1145/2808194.2809495","DOIUrl":"https://doi.org/10.1145/2808194.2809495","url":null,"abstract":"We approach the problem of retrievability from an analytical perspective, starting with modeling conjunctive and disjunctive queries in a boolean model. We show that this represents an upper bound on retrievability for all other best match algorithms. We follow this with an observation of imbalance in the distribution of retrievability, using the Gini coefficient. Simulation-based experiments show the behavior of the Gini coefficient for retrievability under different types and lengths of queries, as well as different assumptions about the document length distribution in a collection.","PeriodicalId":440325,"journal":{"name":"Proceedings of the 2015 International Conference on The Theory of Information Retrieval","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131371135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On Divergence Measures and Static Index Pruning","authors":"Ruey-Cheng Chen, Chia-Jung Lee, W. Bruce Croft","doi":"10.1145/2808194.2809472","DOIUrl":"https://doi.org/10.1145/2808194.2809472","url":null,"abstract":"We study the problem of static index pruning in a renowned divergence minimization framework, using a range of divergence measures such as f-divergence and Rényi divergence as the objective. We show that many well-known divergence measures are convex in pruning decisions, and therefore can be exactly minimized using an efficient algorithm. Our approach allows postings be prioritized according to the amount of information they contribute to the index, and through specifying a different divergence measure the contribution is modeled on a different returns curve. In our experiment on GOV2 data, Rényi divergence of order infinity appears the most effective. This divergence measure significantly outperforms many standard methods and achieves identical retrieval effectiveness as full data using only 50% of the postings. When top-k precision is of the only concern, 10% of the data is sufficient to achieve the accuracy that one would usually expect from a full index.","PeriodicalId":440325,"journal":{"name":"Proceedings of the 2015 International Conference on The Theory of Information Retrieval","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116526351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}