搜索广告的概率首遍检索:从理论到实践

Proceedings of the 19th ACM international conference on Information and knowledge management Pub Date : 2010-10-26 DOI:10.1145/1871437.1871567

Hema Raghavan, R. Iyer

{"title":"搜索广告的概率首遍检索:从理论到实践","authors":"Hema Raghavan, R. Iyer","doi":"10.1145/1871437.1871567","DOIUrl":null,"url":null,"abstract":"Information retrieval in search advertising, as in other ad-hoc retrieval tasks, aims to find the most appropriate ranking of the ad documents of a corpus for a given query. In addition to ranking the ad documents, we also need to filter or threshold irrelevant ads from participating in the auction to be displayed alongside search results. In this work, we describe our experience in implementing a successful ad retrieval system for a commercial search engine based on the Language Modeling (LM) framework for retrieval. The LM demonstrates significant performance improvements over the baseline vector space model (TF-IDF) system that was in production at the time. From a modeling perspective, we propose a novel approach to incorporate query segmentation and phrases in the LM framework, discuss impact of score normalization for relevance filtering, and present preliminary results of incorporating query expansions using query rewriting techniques. From an implementation perspective, we also discuss real-time latency constraints of a production search engine and how we overcome them by adapting the WAND algorithm to work with language models. In sum, our LM formulation is considerably better in terms of accuracy metrics such as Precision-Recall (10% improvement in AUC) and nDCG (8% improvement in nDCG@5) on editorial data and also demonstrates significant improvements in clicks in live user tests (0.787% improvement in Click Yield, with 8% coverage increase). Finally, we hope that this paper provides the reader with adequate insights into the challenges of building a system that serves millions of users every day.","PeriodicalId":310611,"journal":{"name":"Proceedings of the 19th ACM international conference on Information and knowledge management","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"Probabilistic first pass retrieval for search advertising: from theory to practice\",\"authors\":\"Hema Raghavan, R. Iyer\",\"doi\":\"10.1145/1871437.1871567\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Information retrieval in search advertising, as in other ad-hoc retrieval tasks, aims to find the most appropriate ranking of the ad documents of a corpus for a given query. In addition to ranking the ad documents, we also need to filter or threshold irrelevant ads from participating in the auction to be displayed alongside search results. In this work, we describe our experience in implementing a successful ad retrieval system for a commercial search engine based on the Language Modeling (LM) framework for retrieval. The LM demonstrates significant performance improvements over the baseline vector space model (TF-IDF) system that was in production at the time. From a modeling perspective, we propose a novel approach to incorporate query segmentation and phrases in the LM framework, discuss impact of score normalization for relevance filtering, and present preliminary results of incorporating query expansions using query rewriting techniques. From an implementation perspective, we also discuss real-time latency constraints of a production search engine and how we overcome them by adapting the WAND algorithm to work with language models. In sum, our LM formulation is considerably better in terms of accuracy metrics such as Precision-Recall (10% improvement in AUC) and nDCG (8% improvement in nDCG@5) on editorial data and also demonstrates significant improvements in clicks in live user tests (0.787% improvement in Click Yield, with 8% coverage increase). Finally, we hope that this paper provides the reader with adequate insights into the challenges of building a system that serves millions of users every day.\",\"PeriodicalId\":310611,\"journal\":{\"name\":\"Proceedings of the 19th ACM international conference on Information and knowledge management\",\"volume\":\"25 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-10-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 19th ACM international conference on Information and knowledge management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1871437.1871567\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 19th ACM international conference on Information and knowledge management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1871437.1871567","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 11

摘要

搜索广告中的信息检索与其他特别检索任务一样，目的是为给定查询找到语料库中广告文档的最适当排序。除了对广告文件进行排名外，我们还需要过滤或限制参与拍卖的无关广告，以便与搜索结果一起显示。在这项工作中，我们描述了我们在基于语言建模(LM)检索框架的商业搜索引擎中实现成功的广告检索系统的经验。与当时生产的基线向量空间模型(TF-IDF)系统相比，LM显示出显著的性能改进。从建模的角度来看，我们提出了一种在LM框架中合并查询分割和短语的新方法，讨论了分数规范化对相关性过滤的影响，并提出了使用查询重写技术合并查询扩展的初步结果。从实现的角度来看，我们还讨论了产品搜索引擎的实时延迟限制，以及如何通过调整WAND算法来使用语言模型来克服这些限制。总而言之，我们的LM公式在准确性指标方面表现得更好，比如编辑数据的Precision-Recall (AUC提高了10%)和nDCG (nDCG@5提高了8%)，并且在实时用户测试中显示了点击量的显著提高(点击率提高了0.787%，覆盖率提高了8%)。最后，我们希望本文能让读者充分了解构建一个每天为数百万用户服务的系统所面临的挑战。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Probabilistic first pass retrieval for search advertising: from theory to practice

Information retrieval in search advertising, as in other ad-hoc retrieval tasks, aims to find the most appropriate ranking of the ad documents of a corpus for a given query. In addition to ranking the ad documents, we also need to filter or threshold irrelevant ads from participating in the auction to be displayed alongside search results. In this work, we describe our experience in implementing a successful ad retrieval system for a commercial search engine based on the Language Modeling (LM) framework for retrieval. The LM demonstrates significant performance improvements over the baseline vector space model (TF-IDF) system that was in production at the time. From a modeling perspective, we propose a novel approach to incorporate query segmentation and phrases in the LM framework, discuss impact of score normalization for relevance filtering, and present preliminary results of incorporating query expansions using query rewriting techniques. From an implementation perspective, we also discuss real-time latency constraints of a production search engine and how we overcome them by adapting the WAND algorithm to work with language models. In sum, our LM formulation is considerably better in terms of accuracy metrics such as Precision-Recall (10% improvement in AUC) and nDCG (8% improvement in nDCG@5) on editorial data and also demonstrates significant improvements in clicks in live user tests (0.787% improvement in Click Yield, with 8% coverage increase). Finally, we hope that this paper provides the reader with adequate insights into the challenges of building a system that serves millions of users every day.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 19th ACM international conference on Information and knowledge management

自引率

0.00%

发文量