Topic Difficulty: Collection and Query Formulation Effects

ACM Transactions on Information Systems (TOIS) Pub Date : 2021-09-07 DOI:10.1145/3470563

J. Culpepper, G. Faggioli, N. Ferro, Oren Kurland

{"title":"Topic Difficulty: Collection and Query Formulation Effects","authors":"J. Culpepper, G. Faggioli, N. Ferro, Oren Kurland","doi":"10.1145/3470563","DOIUrl":null,"url":null,"abstract":"Several recent studies have explored the interaction effects between topics, systems, corpora, and components when measuring retrieval effectiveness. However, all of these previous studies assume that a topic or information need is represented by a single query. In reality, users routinely reformulate queries to satisfy an information need. In recent years, there has been renewed interest in the notion of “query variations” which are essentially multiple user formulations for an information need. Like many retrieval models, some queries are highly effective while others are not. This is often an artifact of the collection being searched which might be more or less sensitive to word choice. Users rarely have perfect knowledge about the underlying collection, and so finding queries that work is often a trial-and-error process. In this work, we explore the fundamental problem of system interaction effects between collections, ranking models, and queries. To answer this important question, we formalize the analysis using ANalysis Of VAriance (ANOVA) models to measure multiple components effects across collections and topics by nesting multiple query variations within each topic. Our findings show that query formulations have a comparable effect size of the topic factor itself, which is known to be the factor with the greatest effect size in prior ANOVA studies. Both topic and formulation have a substantially larger effect size than any other factor, including the ranking algorithms and, surprisingly, even query expansion. This finding reinforces the importance of further research in understanding the role of query rewriting in IR related tasks.","PeriodicalId":6934,"journal":{"name":"ACM Transactions on Information Systems (TOIS)","volume":"434 1","pages":"1 - 36"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Information Systems (TOIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3470563","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 17

Abstract

Several recent studies have explored the interaction effects between topics, systems, corpora, and components when measuring retrieval effectiveness. However, all of these previous studies assume that a topic or information need is represented by a single query. In reality, users routinely reformulate queries to satisfy an information need. In recent years, there has been renewed interest in the notion of “query variations” which are essentially multiple user formulations for an information need. Like many retrieval models, some queries are highly effective while others are not. This is often an artifact of the collection being searched which might be more or less sensitive to word choice. Users rarely have perfect knowledge about the underlying collection, and so finding queries that work is often a trial-and-error process. In this work, we explore the fundamental problem of system interaction effects between collections, ranking models, and queries. To answer this important question, we formalize the analysis using ANalysis Of VAriance (ANOVA) models to measure multiple components effects across collections and topics by nesting multiple query variations within each topic. Our findings show that query formulations have a comparable effect size of the topic factor itself, which is known to be the factor with the greatest effect size in prior ANOVA studies. Both topic and formulation have a substantially larger effect size than any other factor, including the ranking algorithms and, surprisingly, even query expansion. This finding reinforces the importance of further research in understanding the role of query rewriting in IR related tasks.

查看原文本刊更多论文

题目难度:收集和查询公式效果

最近的一些研究探讨了主题、系统、语料库和组件之间在测量检索效率时的交互效应。然而，所有这些先前的研究都假设一个主题或信息需求是由单个查询表示的。在现实中，用户通常会重新制定查询以满足信息需求。近年来，人们对“查询变化”的概念重新产生了兴趣，它本质上是针对信息需求的多用户表述。与许多检索模型一样，有些查询是非常有效的，而另一些则不是。这通常是正在搜索的集合的工件，可能或多或少对单词选择敏感。用户很少完全了解底层集合，因此查找有效的查询通常是一个反复试验的过程。在这项工作中，我们探索了集合、排名模型和查询之间的系统交互效应的基本问题。为了回答这个重要的问题，我们使用方差分析(ANOVA)模型来形式化分析，通过在每个主题中嵌套多个查询变量来度量跨集合和主题的多个组件影响。我们的研究结果表明，查询公式与主题因素本身具有相当的效应量，而主题因素本身在先前的方差分析研究中已知是具有最大效应量的因素。主题和公式化都比任何其他因素(包括排名算法，甚至是查询扩展)具有更大的效应大小。这一发现加强了进一步研究在理解查询重写在IR相关任务中的作用的重要性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Transactions on Information Systems (TOIS)

自引率

0.00%

发文量