{"title":"Query suggestions in the absence of query logs","authors":"S. Bhatia, Debapriyo Majumdar, P. Mitra","doi":"10.1145/2009916.2010023","DOIUrl":"https://doi.org/10.1145/2009916.2010023","url":null,"abstract":"After an end-user has partially input a query, intelligent search engines can suggest possible completions of the partial query to help end-users quickly express their information needs. All major web-search engines and most proposed methods that suggest queries rely on search engine query logs to determine possible query suggestions. However, for customized search systems in the enterprise domain, intranet search, or personalized search such as email or desktop search or for infrequent queries, query logs are either not available or the user base and the number of past user queries is too small to learn appropriate models. We propose a probabilistic mechanism for generating query suggestions from the corpus without using query logs. We utilize the document corpus to extract a set of candidate phrases. As soon as a user starts typing a query, phrases that are highly correlated with the partial user query are selected as completions of the partial query and are offered as query suggestions. Our proposed approach is tested on a variety of datasets and is compared with state-of-the-art approaches. The experimental results clearly demonstrate the effectiveness of our approach in suggesting queries with higher quality.","PeriodicalId":356580,"journal":{"name":"Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122445136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Semantic tag recommendation using concept model","authors":"Chenliang Li, Anwitaman Datta, Aixin Sun","doi":"10.1145/2009916.2010098","DOIUrl":"https://doi.org/10.1145/2009916.2010098","url":null,"abstract":"The common tags given by multiple users to a particular document are often semantically relevant to the document and each tag represents a specific topic. In this paper, we attempt to emulate human tagging behavior to recommend tags by considering the concepts contained in documents. Specifically, we represent each document using a few most relevant concepts contained in the document, where the concept space is derived from Wikipedia. Tags are then recommended based on the tag concept model derived from the annotated documents of each tag. Evaluated on a Delicious dataset of more than 53K documents, the proposed technique achieved comparable tag recommendation accuracy as the state-of-the-art, while yielding an order of magnitude speed-up.","PeriodicalId":356580,"journal":{"name":"Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127054408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bootstrapping subjectivity detection","authors":"V. Jijkoun, M. de Rijke","doi":"10.1145/2009916.2010081","DOIUrl":"https://doi.org/10.1145/2009916.2010081","url":null,"abstract":"We describe a method for automatically generating subjectivity clues for a specific topic and a set of (relevant) document, evaluating it on the task of classifying sentences w.r.t. subjectivity, with improvements over previous work.","PeriodicalId":356580,"journal":{"name":"Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125273133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
James Caverlee, Zhiyuan Cheng, B. Eoff, Chiao-Fang Hsu, K. Kamath, Jeffrey McGee
{"title":"CrowdTracker: enabling community-based real-time web monitoring","authors":"James Caverlee, Zhiyuan Cheng, B. Eoff, Chiao-Fang Hsu, K. Kamath, Jeffrey McGee","doi":"10.1145/2009916.2010161","DOIUrl":"https://doi.org/10.1145/2009916.2010161","url":null,"abstract":"CrowdTracker is a community-based web monitoring system optimized for real-time web streams like Twitter, Facebook, and Google Buzz. In this demo summary, we provide an overview of the system and architecture, and outline the demonstration plan.","PeriodicalId":356580,"journal":{"name":"Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127889615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the suitability of diversity metrics for learning-to-rank for diversity","authors":"Rodrygo L. T. Santos, C. Macdonald, I. Ounis","doi":"10.1145/2009916.2010111","DOIUrl":"https://doi.org/10.1145/2009916.2010111","url":null,"abstract":"An optimally diverse ranking should achieve the maximum coverage of the aspects underlying an ambiguous or underspecified query, with minimum redundancy with respect to the covered aspects. Although evaluation metrics that reward coverage and penalise redundancy provide intuitive objective functions for learning a diverse ranking, it is unclear whether they are the most effective. In this paper, we contrast the suitability of relevance and diversity metrics as objective functions for learning a diverse ranking. Our results in the context of the diversity task of the TREC 2009 and 2010 Web tracks show that diversity metrics are not necessarily better suited for guiding a learning approach. Moreover, the suitability of these metrics is compromised as they try to penalise redundancy during the learning process.","PeriodicalId":356580,"journal":{"name":"Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval","volume":"172 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127658295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Information organization and retrieval with collaboratively generated content","authors":"Eugene Agichtein, E. Gabrilovich","doi":"10.1145/2009916.2010173","DOIUrl":"https://doi.org/10.1145/2009916.2010173","url":null,"abstract":"Proliferation of ubiquitous access to the Internet enables millions of Web users to collaborate online on a variety of activities. Many of these activities result in the construction of large repositories of knowledge, either as their primary aim (e.g., Wikipedia) or as a by-product (e.g., Yahoo! Answers). In this tutorial, we will discuss organizing and exploiting Collaboratively Generated Content (CGC) for information organization and retrieval. Specifically, we intend to cover two complementary areas of the problem: (1) using such content as a powerful enabling resource for knowledge-enriched, intelligent representations and new information retrieval algorithms, and (2) development of supporting technologies for extracting, filtering, and organizing collaboratively created content. The unprecedented amounts of information in CGC enable new, knowledge-rich approaches to information access, which are significantly more powerful than the conventional word-based methods. Considerable progress has been made in this direction over the last few years. Examples include explicit manipulation of human-defined concepts and their use to augment the bag of words (cf. Explicit Semantic Analysis), using large-scale taxonomies of topics from Wikipedia or the Open Directory Project to construct additional class-based features, or using Wikipedia for better word sense disambiguation. However, the quality and comprehensiveness of collaboratively created content vary widely, and in order for this resource to be useful, a significant amount of preprocessing, filtering, and organization is necessary. Consequently, new methods for analyzing CGC and corresponding user interactions are required to effectively harness the resulting knowledge. Thus, not only the content repositories can be used to improve IR methods, but the reverse pollination is also possible, as better information extraction methods can be used for automatically collecting more knowledge, or verifying the contributed content. This natural connection between modeling the generation process of CGC and effectively using the accumulated knowledge suggests covering both areas together in a single tutorial. The intended audience of the tutorial includes IR researchers and graduate students, who would like to learn about the recent advances and research opportunities in working with collaboratively generated content. The emphasis of the tutorial is on comparing the existing approaches and presenting practical techniques that IR practitioners can use in their research. We also cover open research challenges, as well as survey available resources (software tools and data) for getting started in this research field.","PeriodicalId":356580,"journal":{"name":"Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128129105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cross-language web page classification via dual knowledge transfer using nonnegative matrix tri-factorization","authors":"Hua Wang, Heng Huang, F. Nie, C. Ding","doi":"10.1145/2009916.2010041","DOIUrl":"https://doi.org/10.1145/2009916.2010041","url":null,"abstract":"The lack of sufficient labeled Web pages in many languages, especially for those uncommonly used ones, presents a great challenge to traditional supervised classification methods to achieve satisfactory Web page classification performance. To address this, we propose a novel Nonnegative Matrix Tri-factorization (NMTF) based Dual Knowledge Transfer (DKT) approach for cross-language Web page classification, which is based on the following two important observations. First, we observe that Web pages for a same topic from different languages usually share some common semantic patterns, though in different representation forms. Second, we also observe that the associations between word clusters and Web page classes are a more reliable carrier than raw words to transfer knowledge across languages. With these recognitions, we attempt to transfer knowledge from the auxiliary language, in which abundant labeled Web pages are available, to target languages, in which we want classify Web pages, through two different paths: word cluster approximations and the associations between word clusters and Web page classes. Due to the reinforcement between these two different knowledge transfer paths, our approach can achieve better classification accuracy. We evaluate the proposed approach in extensive experiments using a real world cross-language Web page data set. Promising results demonstrate the effectiveness of our approach that is consistent with our theoretical analyses.","PeriodicalId":356580,"journal":{"name":"Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval","volume":"7 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131803834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"UPS: efficient privacy protection in personalized web search","authors":"Gang Chen, He Bai, L. Shou, Ke Chen, Yunjun Gao","doi":"10.1145/2009916.2009999","DOIUrl":"https://doi.org/10.1145/2009916.2009999","url":null,"abstract":"In recent years, personalized web search (PWS) has demonstrated effectiveness in improving the quality of search service on the Internet. Unfortunately, the need for collecting private information in PWS has become a major barrier for its wide proliferation. We study privacy protection in PWS engines which capture personalities in user profiles. We propose a PWS framework called UPS that can generalize profiles in for each query according to user-specified privacy requirements. Two predictive metrics are proposed to evaluate the privacy breach risk and the query utility for hierarchical user profile. We develop two simple but effective generalization algorithms for user profiles allowing for query-level customization using our proposed metrics. We also provide an online prediction mechanism based on query utility for deciding whether to personalize a query in UPS. Extensive experiments demonstrate the efficiency and effectiveness of our framework.","PeriodicalId":356580,"journal":{"name":"Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133160168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Regularized latent semantic indexing","authors":"Quan Wang, Jun Xu, Hang Li, Nick Craswell","doi":"10.1145/2009916.2010008","DOIUrl":"https://doi.org/10.1145/2009916.2010008","url":null,"abstract":"Topic modeling can boost the performance of information retrieval, but its real-world application is limited due to scalability issues. Scaling to larger document collections via parallelization is an active area of research, but most solutions require drastic steps such as vastly reducing input vocabulary. We introduce Regularized Latent Semantic Indexing (RLSI), a new method which is designed for parallelization. It is as effective as existing topic models, and scales to larger datasets without reducing input vocabulary. RLSI formalizes topic modeling as a problem of minimizing a quadratic loss function regularized by l₂ and/or l₁ norm. This formulation allows the learning process to be decomposed into multiple sub-optimization problems which can be optimized in parallel, for example via MapReduce. We particularly propose adopting l₂ norm on topics and l₁ norm on document representations, to create a model with compact and readable topics and useful for retrieval. Relevance ranking experiments on three TREC datasets show that RLSI performs better than LSI, PLSI, and LDA, and the improvements are sometimes statistically significant. Experiments on a web dataset, containing about 1.6 million documents and 7 million terms, demonstrate a similar boost in performance on a larger corpus and vocabulary than in previous studies.","PeriodicalId":356580,"journal":{"name":"Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133700701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Practical online retrieval evaluation","authors":"Filip Radlinski, Yisong Yue","doi":"10.1145/2009916.2010171","DOIUrl":"https://doi.org/10.1145/2009916.2010171","url":null,"abstract":"Online evaluation is amongst the few evaluation techniques available to the information retrieval community that is guaranteed to reflect how users actually respond to improvements developed by the community. Broadly speaking, online evaluation refers to any evaluation of retrieval quality conducted while observing user behavior in a natural context. However, it is rarely employed outside of large commercial search engines due primarily to a perception that it is impractical at small scales. The goal of this tutorial is to familiarize information retrieval researchers with state-of-the-art techniques in evaluating information retrieval systems based on natural user clicking behavior, as well as to show how such methods can be practically deployed. In particular, our focus will be on demonstrating how the Interleaving approach and other click based techniques contrast with traditional offline evaluation, and how these online methods can be effectively used in academic-scale research. In addition to lecture notes, we will also provide sample software and code walk-throughs to showcase the ease with which Interleaving and other click-based methods can be employed by students, academics and other researchers.","PeriodicalId":356580,"journal":{"name":"Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122514787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}