{"title":"Estimating Retrieval Performance Bound for Single Term Queries","authors":"Peilin Yang, Hui Fang","doi":"10.1145/2970398.2970428","DOIUrl":null,"url":null,"abstract":"Various information retrieval models have been studied for decades. Most traditional retrieval models are based on bag-of-termrepresentations, and they model the relevance based on various collection statistics. Despite these efforts, it seems that the performance of \"bag-of-term\" based retrieval functions has reached plateau, and it becomes increasingly difficult to further improve the retrieval performance. Thus, one important research question is whether we can provide any theoretical justifications on the empirical performance bound of basic retrieval functions. In this paper, we start with single term queries, and aim to estimate the performance bound of retrieval functions that leverage only basic ranking signals such as document term frequency, inverse document frequency and document length normalization. Specifically, we demonstrate that, when only single-term queries are considered, there is a general function that can cover many basic retrieval functions. We then propose to estimate the upper bound performance of this function by applying a cost/gain analysis to search for the optimal value of the function.","PeriodicalId":443715,"journal":{"name":"Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2970398.2970428","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Various information retrieval models have been studied for decades. Most traditional retrieval models are based on bag-of-termrepresentations, and they model the relevance based on various collection statistics. Despite these efforts, it seems that the performance of "bag-of-term" based retrieval functions has reached plateau, and it becomes increasingly difficult to further improve the retrieval performance. Thus, one important research question is whether we can provide any theoretical justifications on the empirical performance bound of basic retrieval functions. In this paper, we start with single term queries, and aim to estimate the performance bound of retrieval functions that leverage only basic ranking signals such as document term frequency, inverse document frequency and document length normalization. Specifically, we demonstrate that, when only single-term queries are considered, there is a general function that can cover many basic retrieval functions. We then propose to estimate the upper bound performance of this function by applying a cost/gain analysis to search for the optimal value of the function.