Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval最新文献_第3页

Efficient and effective solutions for search engines 高效和有效的解决方案的搜索引擎

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval Pub Date : 2011-07-24 DOI: 10.1145/2009916.2010179

Xiangfei Jia

引用次数: 0

Faster top-k document retrieval using block-max indexes 使用块最大索引更快地检索top-k文档

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval Pub Date : 2011-07-24 DOI: 10.1145/2009916.2010048

Shuai Ding, Torsten Suel

{"title":"Faster top-k document retrieval using block-max indexes","authors":"Shuai Ding, Torsten Suel","doi":"10.1145/2009916.2010048","DOIUrl":"https://doi.org/10.1145/2009916.2010048","url":null,"abstract":"Large search engines process thousands of queries per second over billions of documents, making query processing a major performance bottleneck. An important class of optimization techniques called early termination achieves faster query processing by avoiding the scoring of documents that are unlikely to be in the top results. We study new algorithms for early termination that outperform previous methods. In particular, we focus on safe techniques for disjunctive queries, which return the same result as an exhaustive evaluation over the disjunction of the query terms. The current state-of-the-art methods for this case, the WAND algorithm by Broder et al. [11] and the approach of Strohman and Croft [30], achieve great benefits but still leave a large performance gap between disjunctive and (even non-early terminated) conjunctive queries. We propose a new set of algorithms by introducing a simple augmented inverted index structure called a block-max index. Essentially, this is a structure that stores the maximum impact score for each block of a compressed inverted list in uncompressed form, thus enabling us to skip large parts of the lists. We show how to integrate this structure into the WAND approach, leading to considerable performance gains. We then describe extensions to a layered index organization, and to indexes with reassigned document IDs, that achieve additional gains that narrow the gap between disjunctive and conjunctive top-k query processing.","PeriodicalId":356580,"journal":{"name":"Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115339971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 209

Associative tag recommendation exploiting multiple textual features 利用多种文本特征的关联标签推荐

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval Pub Date : 2011-07-24 DOI: 10.1145/2009916.2010053

F. Belém, E. Martins, Tatiana Pontes, J. Almeida, Marcos André Gonçalves

{"title":"Associative tag recommendation exploiting multiple textual features","authors":"F. Belém, E. Martins, Tatiana Pontes, J. Almeida, Marcos André Gonçalves","doi":"10.1145/2009916.2010053","DOIUrl":"https://doi.org/10.1145/2009916.2010053","url":null,"abstract":"This work addresses the task of recommending relevant tags to a target object by jointly exploiting three dimensions of the problem: (i) term co-occurrence with tags pre-assigned to the target object, (ii) terms extracted from multiple textual features, and (iii) several metrics of tag relevance. In particular, we propose several new heuristic methods, which extend state-of-the-art strategies by including new metrics that try to capture how accurately a candidate term describes the object's content. We also exploit two learning-to-rank (L2R) techniques, namely RankSVM and Genetic Programming, for the task of generating ranking functions that combine multiple metrics to accurately estimate the relevance of a tag to a given object. We evaluate all proposed methods in various scenarios for three popular Web 2.0 applications, namely, LastFM, YouTube and YahooVideo. We found that our new heuristics greatly outperform the methods on which they are based, producing gains in precision of up to 181%, as well as another state-of-the-art technique, with improvements in precision of up to 40% over the best baseline in any scenario. Further improvements can also be achieved with the new L2R strategies, which have the additional advantage of being quite flexible and extensible to exploit other aspects of the tag recommendation problem.","PeriodicalId":356580,"journal":{"name":"Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115695418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 44

Predicting eBay listing conversion 预测eBay上市转化率

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval Pub Date : 2011-07-24 DOI: 10.1145/2009916.2010188

T. Yuan, Zhaohui Chen, Mike Mathieson

引用次数: 7

Personalized video: leanback online video consumption 个性化视频:leanback在线视频消费

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval Pub Date : 2011-07-24 DOI: 10.1145/2009916.2010158

Krishnan Ramanathan, Y. Sankarasubramaniam, Vidhya Govindaraju

引用次数: 1

iMecho: a context-aware desktop search system iMecho:一个上下文感知的桌面搜索系统

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval Pub Date : 2011-07-24 DOI: 10.1145/2009916.2010154

Jidong Chen, Hang Guo, Wentao Wu, Wei Wang

引用次数: 4

Multi-objective optimization in learning to rank 排序学习中的多目标优化

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval Pub Date : 2011-07-24 DOI: 10.1145/2009916.2010139

Na Dai, Milad Shokouhi, Brian D. Davison

引用次数: 7

Do IR models satisfy the TDC retrieval constraint IR模型是否满足TDC检索约束

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval Pub Date : 2011-07-24 DOI: 10.1145/2009916.2010096

S. Clinchant, Éric Gaussier

引用次数: 2

Enriching document representation via translation for improved monolingual information retrieval 通过翻译丰富文档表示以改进单语信息检索

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval Pub Date : 2011-07-24 DOI: 10.1145/2009916.2010030

Seung-Hoon Na, H. Ng

{"title":"Enriching document representation via translation for improved monolingual information retrieval","authors":"Seung-Hoon Na, H. Ng","doi":"10.1145/2009916.2010030","DOIUrl":"https://doi.org/10.1145/2009916.2010030","url":null,"abstract":"Word ambiguity and vocabulary mismatch are critical problems in information retrieval. To deal with these problems, this paper proposes the use of translated words to enrich document representation, going beyond the words in the original source language to represent a document. In our approach, each original document is automatically translated into an auxiliary language, and the resulting translated document serves as a semantically enhanced representation for supplementing the original bag of words. The core of our translation representation is the expected term frequency of a word in a translated document, which is calculated by averaging the term frequencies over all possible translations, rather than focusing on the 1-best translation only. To achieve better efficiency of translation, we do not rely on full-fledged machine translation, but instead use monotonic translation by removing the time-consuming reordering component. Experiments carried out on standard TREC test collections show that our proposed translation representation leads to statistically significant improvements over using only the original language of the document collection.","PeriodicalId":356580,"journal":{"name":"Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128752345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Scalable multi-dimensional user intent identification using tree structured distributions 使用树状结构分布的可扩展多维用户意图识别

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval Pub Date : 2011-07-24 DOI: 10.1145/2009916.2009971

Vinay Jethava, Liliana Calderón-Benavides, R. Baeza-Yates, C. Bhattacharyya, Devdatt P. Dubhashi

{"title":"Scalable multi-dimensional user intent identification using tree structured distributions","authors":"Vinay Jethava, Liliana Calderón-Benavides, R. Baeza-Yates, C. Bhattacharyya, Devdatt P. Dubhashi","doi":"10.1145/2009916.2009971","DOIUrl":"https://doi.org/10.1145/2009916.2009971","url":null,"abstract":"The problem of identifying user intent has received considerable attention in recent years, particularly in the context of improving the search experience via query contextualization. Intent can be characterized by multiple dimensions, which are often not observed from query words alone. Accurate identification of Intent from query words remains a challenging problem primarily because it is extremely difficult to discover these dimensions. The problem is often significantly compounded due to lack of representative training sample. We present a generic, extensible framework for learning the multi-dimensional representation of user intent from the query words. The approach models the latent relationships between facets using tree structured distribution which leads to an efficient and convergent algorithm, FastQ, for identifying the multi-faceted intent of users based on just the query words. We also incorporated WordNet to extend the system capabilities to queries which contain words that do not appear in the training data. Empirical results show that FastQ yields accurate identification of intent when compared to a gold standard.","PeriodicalId":356580,"journal":{"name":"Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127259356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14