Hemanth Kandula, Damianos Karakos, Haoling Qiu, Benjamin Rozonoyer, Ian Soboroff, Lee Tarlin, Bonan Min
{"title":"QueryBuilder: Human-in-the-Loop Query Development for Information Retrieval","authors":"Hemanth Kandula, Damianos Karakos, Haoling Qiu, Benjamin Rozonoyer, Ian Soboroff, Lee Tarlin, Bonan Min","doi":"arxiv-2409.04667","DOIUrl":null,"url":null,"abstract":"Frequently, users of an Information Retrieval (IR) system start with an\noverarching information need (a.k.a., an analytic task) and proceed to define\nfiner-grained queries covering various important aspects (i.e., sub-topics) of\nthat analytic task. We present a novel, interactive system called\n$\\textit{QueryBuilder}$, which allows a novice, English-speaking user to create\nqueries with a small amount of effort, through efficient exploration of an\nEnglish development corpus in order to rapidly develop cross-lingual\ninformation retrieval queries corresponding to the user's information needs.\nQueryBuilder performs near real-time retrieval of documents based on\nuser-entered search terms; the user looks through the retrieved documents and\nmarks sentences as relevant to the information needed. The marked sentences are\nused by the system as additional information in query formation and refinement:\nquery terms (and, optionally, event features, which capture event $'triggers'$\n(indicator terms) and agent/patient roles) are appropriately weighted, and a\nneural-based system, which better captures textual meaning, retrieves other\nrelevant content. The process of retrieval and marking is repeated as many\ntimes as desired, giving rise to increasingly refined queries in each\niteration. The final product is a fine-grained query used in Cross-Lingual\nInformation Retrieval (CLIR). Our experiments using analytic tasks and requests\nfrom the IARPA BETTER IR datasets show that with a small amount of effort (at\nmost 10 minutes per sub-topic), novice users can form $\\textit{useful}$\nfine-grained queries including in languages they don't understand. QueryBuilder\nalso provides beneficial capabilities to the traditional corpus exploration and\nquery formation process. A demonstration video is released at\nhttps://vimeo.com/734795835","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.04667","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Frequently, users of an Information Retrieval (IR) system start with an
overarching information need (a.k.a., an analytic task) and proceed to define
finer-grained queries covering various important aspects (i.e., sub-topics) of
that analytic task. We present a novel, interactive system called
$\textit{QueryBuilder}$, which allows a novice, English-speaking user to create
queries with a small amount of effort, through efficient exploration of an
English development corpus in order to rapidly develop cross-lingual
information retrieval queries corresponding to the user's information needs.
QueryBuilder performs near real-time retrieval of documents based on
user-entered search terms; the user looks through the retrieved documents and
marks sentences as relevant to the information needed. The marked sentences are
used by the system as additional information in query formation and refinement:
query terms (and, optionally, event features, which capture event $'triggers'$
(indicator terms) and agent/patient roles) are appropriately weighted, and a
neural-based system, which better captures textual meaning, retrieves other
relevant content. The process of retrieval and marking is repeated as many
times as desired, giving rise to increasingly refined queries in each
iteration. The final product is a fine-grained query used in Cross-Lingual
Information Retrieval (CLIR). Our experiments using analytic tasks and requests
from the IARPA BETTER IR datasets show that with a small amount of effort (at
most 10 minutes per sub-topic), novice users can form $\textit{useful}$
fine-grained queries including in languages they don't understand. QueryBuilder
also provides beneficial capabilities to the traditional corpus exploration and
query formation process. A demonstration video is released at
https://vimeo.com/734795835