Hemanth Kandula, Damianos Karakos, Haoling Qiu, Benjamin Rozonoyer, Ian Soboroff, Lee Tarlin, Bonan Min
{"title":"QueryBuilder:用于信息检索的人工回路查询开发","authors":"Hemanth Kandula, Damianos Karakos, Haoling Qiu, Benjamin Rozonoyer, Ian Soboroff, Lee Tarlin, Bonan Min","doi":"arxiv-2409.04667","DOIUrl":null,"url":null,"abstract":"Frequently, users of an Information Retrieval (IR) system start with an\noverarching information need (a.k.a., an analytic task) and proceed to define\nfiner-grained queries covering various important aspects (i.e., sub-topics) of\nthat analytic task. We present a novel, interactive system called\n$\\textit{QueryBuilder}$, which allows a novice, English-speaking user to create\nqueries with a small amount of effort, through efficient exploration of an\nEnglish development corpus in order to rapidly develop cross-lingual\ninformation retrieval queries corresponding to the user's information needs.\nQueryBuilder performs near real-time retrieval of documents based on\nuser-entered search terms; the user looks through the retrieved documents and\nmarks sentences as relevant to the information needed. The marked sentences are\nused by the system as additional information in query formation and refinement:\nquery terms (and, optionally, event features, which capture event $'triggers'$\n(indicator terms) and agent/patient roles) are appropriately weighted, and a\nneural-based system, which better captures textual meaning, retrieves other\nrelevant content. The process of retrieval and marking is repeated as many\ntimes as desired, giving rise to increasingly refined queries in each\niteration. The final product is a fine-grained query used in Cross-Lingual\nInformation Retrieval (CLIR). Our experiments using analytic tasks and requests\nfrom the IARPA BETTER IR datasets show that with a small amount of effort (at\nmost 10 minutes per sub-topic), novice users can form $\\textit{useful}$\nfine-grained queries including in languages they don't understand. QueryBuilder\nalso provides beneficial capabilities to the traditional corpus exploration and\nquery formation process. A demonstration video is released at\nhttps://vimeo.com/734795835","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":"30 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"QueryBuilder: Human-in-the-Loop Query Development for Information Retrieval\",\"authors\":\"Hemanth Kandula, Damianos Karakos, Haoling Qiu, Benjamin Rozonoyer, Ian Soboroff, Lee Tarlin, Bonan Min\",\"doi\":\"arxiv-2409.04667\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Frequently, users of an Information Retrieval (IR) system start with an\\noverarching information need (a.k.a., an analytic task) and proceed to define\\nfiner-grained queries covering various important aspects (i.e., sub-topics) of\\nthat analytic task. We present a novel, interactive system called\\n$\\\\textit{QueryBuilder}$, which allows a novice, English-speaking user to create\\nqueries with a small amount of effort, through efficient exploration of an\\nEnglish development corpus in order to rapidly develop cross-lingual\\ninformation retrieval queries corresponding to the user's information needs.\\nQueryBuilder performs near real-time retrieval of documents based on\\nuser-entered search terms; the user looks through the retrieved documents and\\nmarks sentences as relevant to the information needed. The marked sentences are\\nused by the system as additional information in query formation and refinement:\\nquery terms (and, optionally, event features, which capture event $'triggers'$\\n(indicator terms) and agent/patient roles) are appropriately weighted, and a\\nneural-based system, which better captures textual meaning, retrieves other\\nrelevant content. The process of retrieval and marking is repeated as many\\ntimes as desired, giving rise to increasingly refined queries in each\\niteration. The final product is a fine-grained query used in Cross-Lingual\\nInformation Retrieval (CLIR). Our experiments using analytic tasks and requests\\nfrom the IARPA BETTER IR datasets show that with a small amount of effort (at\\nmost 10 minutes per sub-topic), novice users can form $\\\\textit{useful}$\\nfine-grained queries including in languages they don't understand. QueryBuilder\\nalso provides beneficial capabilities to the traditional corpus exploration and\\nquery formation process. A demonstration video is released at\\nhttps://vimeo.com/734795835\",\"PeriodicalId\":501281,\"journal\":{\"name\":\"arXiv - CS - Information Retrieval\",\"volume\":\"30 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Information Retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.04667\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.04667","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
通常情况下,信息检索(IR)系统的用户会从一个总体信息需求(又称分析任务)开始,然后定义涵盖该分析任务各个重要方面(即子主题)的细粒度查询。我们提出了一个名为 "查询生成器"($textit{QueryBuilder}$)的新颖交互式系统,它允许英语新手用户通过有效地探索英语开发语料库,以较小的工作量创建查询,从而快速开发出与用户信息需求相对应的跨语言信息检索查询。系统将标记的句子用作查询形成和完善过程中的附加信息:查询词(以及可选的事件特征,可捕捉事件$'triggers'$(指示词)和代理人/患者角色)会被适当加权,而基于神经的系统能更好地捕捉文本含义,并检索其他相关内容。检索和标记的过程可根据需要多次重复,每次迭代都会产生越来越精细的查询。最终产品就是跨语言信息检索(CLIR)中使用的细粒度查询。我们使用来自 IARPA BETTER IR 数据集的分析任务和请求进行的实验表明,新手用户只需花费少量精力(每个子主题最多 10 分钟),就能形成 $\textit{useful}$ 细粒度查询,包括用他们不懂的语言进行查询。查询生成器还为传统的语料库探索和查询形成过程提供了有益的功能。演示视频发布于:https://vimeo.com/734795835
QueryBuilder: Human-in-the-Loop Query Development for Information Retrieval
Frequently, users of an Information Retrieval (IR) system start with an
overarching information need (a.k.a., an analytic task) and proceed to define
finer-grained queries covering various important aspects (i.e., sub-topics) of
that analytic task. We present a novel, interactive system called
$\textit{QueryBuilder}$, which allows a novice, English-speaking user to create
queries with a small amount of effort, through efficient exploration of an
English development corpus in order to rapidly develop cross-lingual
information retrieval queries corresponding to the user's information needs.
QueryBuilder performs near real-time retrieval of documents based on
user-entered search terms; the user looks through the retrieved documents and
marks sentences as relevant to the information needed. The marked sentences are
used by the system as additional information in query formation and refinement:
query terms (and, optionally, event features, which capture event $'triggers'$
(indicator terms) and agent/patient roles) are appropriately weighted, and a
neural-based system, which better captures textual meaning, retrieves other
relevant content. The process of retrieval and marking is repeated as many
times as desired, giving rise to increasingly refined queries in each
iteration. The final product is a fine-grained query used in Cross-Lingual
Information Retrieval (CLIR). Our experiments using analytic tasks and requests
from the IARPA BETTER IR datasets show that with a small amount of effort (at
most 10 minutes per sub-topic), novice users can form $\textit{useful}$
fine-grained queries including in languages they don't understand. QueryBuilder
also provides beneficial capabilities to the traditional corpus exploration and
query formation process. A demonstration video is released at
https://vimeo.com/734795835