VoMBaT: A Tool for Visualising Evaluation Measure Behaviour in High-Recall Search Tasks

Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval Pub Date : 2023-07-18 DOI:10.1145/3539618.3591802

Wojciech Kusa, Aldo Lipani, Petr Knoth, A. Hanbury

{"title":"VoMBaT: A Tool for Visualising Evaluation Measure Behaviour in High-Recall Search Tasks","authors":"Wojciech Kusa, Aldo Lipani, Petr Knoth, A. Hanbury","doi":"10.1145/3539618.3591802","DOIUrl":null,"url":null,"abstract":"The objective of High-Recall Information Retrieval (HRIR) is to retrieve as many relevant documents as possible for a given search topic. One approach to HRIR is Technology-Assisted Review (TAR), which uses information retrieval and machine learning techniques to aid the review of large document collections. TAR systems are commonly used in legal eDiscovery and systematic literature reviews. Successful TAR systems are able to find the majority of relevant documents using the least number of assessments. Commonly used retrospective evaluation assumes that the system achieves a specific, fixed recall level first, and then measures the precision or work saved (e.g., precision at r% recall). This approach can cause problems related to understanding the behaviour of evaluation measures in a fixed recall setting. It is also problematic when estimating time and money savings during technology-assisted reviews. This paper presents a new visual analytics tool to explore the dynamics of evaluation measures depending on recall level. We implemented 18 evaluation measures based on the confusion matrix terms, both from general IR tasks and specific to TAR. The tool allows for a comparison of the behaviour of these measures in a fixed recall evaluation setting. It can also simulate savings in time and money and a count of manual vs automatic assessments for different datasets depending on the model quality. The tool is open-source, and the demo is available under the following URL: https://vombat.streamlit.app.","PeriodicalId":425056,"journal":{"name":"Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"94 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3539618.3591802","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

The objective of High-Recall Information Retrieval (HRIR) is to retrieve as many relevant documents as possible for a given search topic. One approach to HRIR is Technology-Assisted Review (TAR), which uses information retrieval and machine learning techniques to aid the review of large document collections. TAR systems are commonly used in legal eDiscovery and systematic literature reviews. Successful TAR systems are able to find the majority of relevant documents using the least number of assessments. Commonly used retrospective evaluation assumes that the system achieves a specific, fixed recall level first, and then measures the precision or work saved (e.g., precision at r% recall). This approach can cause problems related to understanding the behaviour of evaluation measures in a fixed recall setting. It is also problematic when estimating time and money savings during technology-assisted reviews. This paper presents a new visual analytics tool to explore the dynamics of evaluation measures depending on recall level. We implemented 18 evaluation measures based on the confusion matrix terms, both from general IR tasks and specific to TAR. The tool allows for a comparison of the behaviour of these measures in a fixed recall evaluation setting. It can also simulate savings in time and money and a count of manual vs automatic assessments for different datasets depending on the model quality. The tool is open-source, and the demo is available under the following URL: https://vombat.streamlit.app.

查看原文本刊更多论文

一个可视化高召回搜索任务中评价测量行为的工具

高查全率信息检索(HRIR)的目标是针对给定的检索主题检索尽可能多的相关文档。HRIR的一种方法是技术辅助审查(TAR)，它使用信息检索和机器学习技术来帮助审查大型文档集合。TAR系统通常用于法律电子取证和系统文献综述。成功的TAR系统能够使用最少的评估次数找到大多数相关文件。通常使用的回顾性评估假设系统首先达到特定的固定召回水平，然后测量精确度或节省的工作量(例如，召回率为r%的精确度)。这种方法可能会导致与理解固定召回设置中评估措施的行为相关的问题。在评估技术辅助审查期间节省的时间和金钱时，这也是有问题的。本文提出了一种新的可视化分析工具来探索评价措施的动态取决于召回水平。我们基于混淆矩阵项实施了18项评估措施，包括一般IR任务和特定TAR任务。该工具允许在固定召回评估设置这些措施的行为的比较。它还可以模拟节省的时间和金钱，以及根据模型质量对不同数据集进行手动和自动评估的计数。该工具是开源的，演示可以在以下URL下获得:https://vombat.streamlit.app。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

自引率

0.00%

发文量