分数归一化的信噪方法

Proceedings of the 18th ACM conference on Information and knowledge management Pub Date : 2009-11-02 DOI:10.1145/1645953.1646055

A. Arampatzis, J. Kamps

{"title":"分数归一化的信噪方法","authors":"A. Arampatzis, J. Kamps","doi":"10.1145/1645953.1646055","DOIUrl":null,"url":null,"abstract":"Score normalization is indispensable in distributed retrieval and fusion or meta-search where merging of result-lists is required. Distributional approaches to score normalization with reference to relevance, such as binary mixture models like the normal-exponential, suffer from lack of universality and troublesome parameter estimation especially under sparse relevance. We develop a new approach which tackles both problems by using aggregate score distributions without reference to relevance, and is suitable for uncooperative engines. The method is based on the assumption that scores produced by engines consist of a signal and a noise component which can both be approximated by submitting well-defined sets of artificial queries to each engine. We evaluate in a standard distributed retrieval testbed and show that the signal-to-noise approach yields better results than other distributional methods. As a significant by-product, we investigate query-length distributions.","PeriodicalId":286251,"journal":{"name":"Proceedings of the 18th ACM conference on Information and knowledge management","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"46","resultStr":"{\"title\":\"A signal-to-noise approach to score normalization\",\"authors\":\"A. Arampatzis, J. Kamps\",\"doi\":\"10.1145/1645953.1646055\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Score normalization is indispensable in distributed retrieval and fusion or meta-search where merging of result-lists is required. Distributional approaches to score normalization with reference to relevance, such as binary mixture models like the normal-exponential, suffer from lack of universality and troublesome parameter estimation especially under sparse relevance. We develop a new approach which tackles both problems by using aggregate score distributions without reference to relevance, and is suitable for uncooperative engines. The method is based on the assumption that scores produced by engines consist of a signal and a noise component which can both be approximated by submitting well-defined sets of artificial queries to each engine. We evaluate in a standard distributed retrieval testbed and show that the signal-to-noise approach yields better results than other distributional methods. As a significant by-product, we investigate query-length distributions.\",\"PeriodicalId\":286251,\"journal\":{\"name\":\"Proceedings of the 18th ACM conference on Information and knowledge management\",\"volume\":\"4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-11-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"46\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 18th ACM conference on Information and knowledge management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1645953.1646055\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 18th ACM conference on Information and knowledge management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1645953.1646055","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 46

摘要

在需要合并结果列表的分布式检索和融合或元搜索中，分数归一化是必不可少的。基于相关性的分数归一化的分布方法，如二元混合模型(如正态指数模型)，缺乏通用性和参数估计麻烦，特别是在稀疏相关性下。我们开发了一种新的方法，通过使用总体得分分布而不参考相关性来解决这两个问题，并且适用于非合作引擎。该方法基于这样的假设，即引擎产生的分数由一个信号和一个噪声组成，这两个成分都可以通过向每个引擎提交定义良好的人工查询集来近似。我们在一个标准的分布式检索测试平台上进行了评估，并表明信噪比方法比其他分布方法产生更好的结果。作为一个重要的副产品，我们研究了查询长度分布。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A signal-to-noise approach to score normalization

Score normalization is indispensable in distributed retrieval and fusion or meta-search where merging of result-lists is required. Distributional approaches to score normalization with reference to relevance, such as binary mixture models like the normal-exponential, suffer from lack of universality and troublesome parameter estimation especially under sparse relevance. We develop a new approach which tackles both problems by using aggregate score distributions without reference to relevance, and is suitable for uncooperative engines. The method is based on the assumption that scores produced by engines consist of a signal and a noise component which can both be approximated by submitting well-defined sets of artificial queries to each engine. We evaluate in a standard distributed retrieval testbed and show that the signal-to-noise approach yields better results than other distributional methods. As a significant by-product, we investigate query-length distributions.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 18th ACM conference on Information and knowledge management

自引率

0.00%

发文量