Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval最新文献

筛选
英文 中文
Query suggestions in the absence of query logs 无查询日志时的查询建议
S. Bhatia, Debapriyo Majumdar, P. Mitra
{"title":"Query suggestions in the absence of query logs","authors":"S. Bhatia, Debapriyo Majumdar, P. Mitra","doi":"10.1145/2009916.2010023","DOIUrl":"https://doi.org/10.1145/2009916.2010023","url":null,"abstract":"After an end-user has partially input a query, intelligent search engines can suggest possible completions of the partial query to help end-users quickly express their information needs. All major web-search engines and most proposed methods that suggest queries rely on search engine query logs to determine possible query suggestions. However, for customized search systems in the enterprise domain, intranet search, or personalized search such as email or desktop search or for infrequent queries, query logs are either not available or the user base and the number of past user queries is too small to learn appropriate models. We propose a probabilistic mechanism for generating query suggestions from the corpus without using query logs. We utilize the document corpus to extract a set of candidate phrases. As soon as a user starts typing a query, phrases that are highly correlated with the partial user query are selected as completions of the partial query and are offered as query suggestions. Our proposed approach is tested on a variety of datasets and is compared with state-of-the-art approaches. The experimental results clearly demonstrate the effectiveness of our approach in suggesting queries with higher quality.","PeriodicalId":356580,"journal":{"name":"Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122445136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 172
Semantic tag recommendation using concept model 基于概念模型的语义标签推荐
Chenliang Li, Anwitaman Datta, Aixin Sun
{"title":"Semantic tag recommendation using concept model","authors":"Chenliang Li, Anwitaman Datta, Aixin Sun","doi":"10.1145/2009916.2010098","DOIUrl":"https://doi.org/10.1145/2009916.2010098","url":null,"abstract":"The common tags given by multiple users to a particular document are often semantically relevant to the document and each tag represents a specific topic. In this paper, we attempt to emulate human tagging behavior to recommend tags by considering the concepts contained in documents. Specifically, we represent each document using a few most relevant concepts contained in the document, where the concept space is derived from Wikipedia. Tags are then recommended based on the tag concept model derived from the annotated documents of each tag. Evaluated on a Delicious dataset of more than 53K documents, the proposed technique achieved comparable tag recommendation accuracy as the state-of-the-art, while yielding an order of magnitude speed-up.","PeriodicalId":356580,"journal":{"name":"Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127054408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Bootstrapping subjectivity detection 自举主观性检测
V. Jijkoun, M. de Rijke
{"title":"Bootstrapping subjectivity detection","authors":"V. Jijkoun, M. de Rijke","doi":"10.1145/2009916.2010081","DOIUrl":"https://doi.org/10.1145/2009916.2010081","url":null,"abstract":"We describe a method for automatically generating subjectivity clues for a specific topic and a set of (relevant) document, evaluating it on the task of classifying sentences w.r.t. subjectivity, with improvements over previous work.","PeriodicalId":356580,"journal":{"name":"Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125273133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
CrowdTracker: enabling community-based real-time web monitoring CrowdTracker:实现基于社区的实时网络监控
James Caverlee, Zhiyuan Cheng, B. Eoff, Chiao-Fang Hsu, K. Kamath, Jeffrey McGee
{"title":"CrowdTracker: enabling community-based real-time web monitoring","authors":"James Caverlee, Zhiyuan Cheng, B. Eoff, Chiao-Fang Hsu, K. Kamath, Jeffrey McGee","doi":"10.1145/2009916.2010161","DOIUrl":"https://doi.org/10.1145/2009916.2010161","url":null,"abstract":"CrowdTracker is a community-based web monitoring system optimized for real-time web streams like Twitter, Facebook, and Google Buzz. In this demo summary, we provide an overview of the system and architecture, and outline the demonstration plan.","PeriodicalId":356580,"journal":{"name":"Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127889615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
On the suitability of diversity metrics for learning-to-rank for diversity 论多样性指标在多样性排序学习中的适用性
Rodrygo L. T. Santos, C. Macdonald, I. Ounis
{"title":"On the suitability of diversity metrics for learning-to-rank for diversity","authors":"Rodrygo L. T. Santos, C. Macdonald, I. Ounis","doi":"10.1145/2009916.2010111","DOIUrl":"https://doi.org/10.1145/2009916.2010111","url":null,"abstract":"An optimally diverse ranking should achieve the maximum coverage of the aspects underlying an ambiguous or underspecified query, with minimum redundancy with respect to the covered aspects. Although evaluation metrics that reward coverage and penalise redundancy provide intuitive objective functions for learning a diverse ranking, it is unclear whether they are the most effective. In this paper, we contrast the suitability of relevance and diversity metrics as objective functions for learning a diverse ranking. Our results in the context of the diversity task of the TREC 2009 and 2010 Web tracks show that diversity metrics are not necessarily better suited for guiding a learning approach. Moreover, the suitability of these metrics is compromised as they try to penalise redundancy during the learning process.","PeriodicalId":356580,"journal":{"name":"Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval","volume":"172 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127658295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Information organization and retrieval with collaboratively generated content 使用协作生成的内容进行信息组织和检索
Eugene Agichtein, E. Gabrilovich
{"title":"Information organization and retrieval with collaboratively generated content","authors":"Eugene Agichtein, E. Gabrilovich","doi":"10.1145/2009916.2010173","DOIUrl":"https://doi.org/10.1145/2009916.2010173","url":null,"abstract":"Proliferation of ubiquitous access to the Internet enables millions of Web users to collaborate online on a variety of activities. Many of these activities result in the construction of large repositories of knowledge, either as their primary aim (e.g., Wikipedia) or as a by-product (e.g., Yahoo! Answers). In this tutorial, we will discuss organizing and exploiting Collaboratively Generated Content (CGC) for information organization and retrieval. Specifically, we intend to cover two complementary areas of the problem: (1) using such content as a powerful enabling resource for knowledge-enriched, intelligent representations and new information retrieval algorithms, and (2) development of supporting technologies for extracting, filtering, and organizing collaboratively created content. The unprecedented amounts of information in CGC enable new, knowledge-rich approaches to information access, which are significantly more powerful than the conventional word-based methods. Considerable progress has been made in this direction over the last few years. Examples include explicit manipulation of human-defined concepts and their use to augment the bag of words (cf. Explicit Semantic Analysis), using large-scale taxonomies of topics from Wikipedia or the Open Directory Project to construct additional class-based features, or using Wikipedia for better word sense disambiguation. However, the quality and comprehensiveness of collaboratively created content vary widely, and in order for this resource to be useful, a significant amount of preprocessing, filtering, and organization is necessary. Consequently, new methods for analyzing CGC and corresponding user interactions are required to effectively harness the resulting knowledge. Thus, not only the content repositories can be used to improve IR methods, but the reverse pollination is also possible, as better information extraction methods can be used for automatically collecting more knowledge, or verifying the contributed content. This natural connection between modeling the generation process of CGC and effectively using the accumulated knowledge suggests covering both areas together in a single tutorial. The intended audience of the tutorial includes IR researchers and graduate students, who would like to learn about the recent advances and research opportunities in working with collaboratively generated content. The emphasis of the tutorial is on comparing the existing approaches and presenting practical techniques that IR practitioners can use in their research. We also cover open research challenges, as well as survey available resources (software tools and data) for getting started in this research field.","PeriodicalId":356580,"journal":{"name":"Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128129105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Cross-language web page classification via dual knowledge transfer using nonnegative matrix tri-factorization 基于非负矩阵三因子分解的双知识迁移跨语言网页分类
Hua Wang, Heng Huang, F. Nie, C. Ding
{"title":"Cross-language web page classification via dual knowledge transfer using nonnegative matrix tri-factorization","authors":"Hua Wang, Heng Huang, F. Nie, C. Ding","doi":"10.1145/2009916.2010041","DOIUrl":"https://doi.org/10.1145/2009916.2010041","url":null,"abstract":"The lack of sufficient labeled Web pages in many languages, especially for those uncommonly used ones, presents a great challenge to traditional supervised classification methods to achieve satisfactory Web page classification performance. To address this, we propose a novel Nonnegative Matrix Tri-factorization (NMTF) based Dual Knowledge Transfer (DKT) approach for cross-language Web page classification, which is based on the following two important observations. First, we observe that Web pages for a same topic from different languages usually share some common semantic patterns, though in different representation forms. Second, we also observe that the associations between word clusters and Web page classes are a more reliable carrier than raw words to transfer knowledge across languages. With these recognitions, we attempt to transfer knowledge from the auxiliary language, in which abundant labeled Web pages are available, to target languages, in which we want classify Web pages, through two different paths: word cluster approximations and the associations between word clusters and Web page classes. Due to the reinforcement between these two different knowledge transfer paths, our approach can achieve better classification accuracy. We evaluate the proposed approach in extensive experiments using a real world cross-language Web page data set. Promising results demonstrate the effectiveness of our approach that is consistent with our theoretical analyses.","PeriodicalId":356580,"journal":{"name":"Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval","volume":"7 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131803834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 82
UPS: efficient privacy protection in personalized web search UPS:个性化网页搜索中高效的隐私保护
Gang Chen, He Bai, L. Shou, Ke Chen, Yunjun Gao
{"title":"UPS: efficient privacy protection in personalized web search","authors":"Gang Chen, He Bai, L. Shou, Ke Chen, Yunjun Gao","doi":"10.1145/2009916.2009999","DOIUrl":"https://doi.org/10.1145/2009916.2009999","url":null,"abstract":"In recent years, personalized web search (PWS) has demonstrated effectiveness in improving the quality of search service on the Internet. Unfortunately, the need for collecting private information in PWS has become a major barrier for its wide proliferation. We study privacy protection in PWS engines which capture personalities in user profiles. We propose a PWS framework called UPS that can generalize profiles in for each query according to user-specified privacy requirements. Two predictive metrics are proposed to evaluate the privacy breach risk and the query utility for hierarchical user profile. We develop two simple but effective generalization algorithms for user profiles allowing for query-level customization using our proposed metrics. We also provide an online prediction mechanism based on query utility for deciding whether to personalize a query in UPS. Extensive experiments demonstrate the efficiency and effectiveness of our framework.","PeriodicalId":356580,"journal":{"name":"Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133160168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 58
Regularized latent semantic indexing 正则化潜在语义索引
Quan Wang, Jun Xu, Hang Li, Nick Craswell
{"title":"Regularized latent semantic indexing","authors":"Quan Wang, Jun Xu, Hang Li, Nick Craswell","doi":"10.1145/2009916.2010008","DOIUrl":"https://doi.org/10.1145/2009916.2010008","url":null,"abstract":"Topic modeling can boost the performance of information retrieval, but its real-world application is limited due to scalability issues. Scaling to larger document collections via parallelization is an active area of research, but most solutions require drastic steps such as vastly reducing input vocabulary. We introduce Regularized Latent Semantic Indexing (RLSI), a new method which is designed for parallelization. It is as effective as existing topic models, and scales to larger datasets without reducing input vocabulary. RLSI formalizes topic modeling as a problem of minimizing a quadratic loss function regularized by l₂ and/or l₁ norm. This formulation allows the learning process to be decomposed into multiple sub-optimization problems which can be optimized in parallel, for example via MapReduce. We particularly propose adopting l₂ norm on topics and l₁ norm on document representations, to create a model with compact and readable topics and useful for retrieval. Relevance ranking experiments on three TREC datasets show that RLSI performs better than LSI, PLSI, and LDA, and the improvements are sometimes statistically significant. Experiments on a web dataset, containing about 1.6 million documents and 7 million terms, demonstrate a similar boost in performance on a larger corpus and vocabulary than in previous studies.","PeriodicalId":356580,"journal":{"name":"Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133700701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 77
Practical online retrieval evaluation 实用在线检索评价
Filip Radlinski, Yisong Yue
{"title":"Practical online retrieval evaluation","authors":"Filip Radlinski, Yisong Yue","doi":"10.1145/2009916.2010171","DOIUrl":"https://doi.org/10.1145/2009916.2010171","url":null,"abstract":"Online evaluation is amongst the few evaluation techniques available to the information retrieval community that is guaranteed to reflect how users actually respond to improvements developed by the community. Broadly speaking, online evaluation refers to any evaluation of retrieval quality conducted while observing user behavior in a natural context. However, it is rarely employed outside of large commercial search engines due primarily to a perception that it is impractical at small scales. The goal of this tutorial is to familiarize information retrieval researchers with state-of-the-art techniques in evaluating information retrieval systems based on natural user clicking behavior, as well as to show how such methods can be practically deployed. In particular, our focus will be on demonstrating how the Interleaving approach and other click based techniques contrast with traditional offline evaluation, and how these online methods can be effectively used in academic-scale research. In addition to lecture notes, we will also provide sample software and code walk-throughs to showcase the ease with which Interleaving and other click-based methods can be employed by students, academics and other researchers.","PeriodicalId":356580,"journal":{"name":"Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122514787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信