MoQA: Benchmarking Multi-Type Open-Domain Question Answering

Ho-Ching Yen, Tianyu Gao, Jinhyuk Lee, Danqi Chen
{"title":"MoQA: Benchmarking Multi-Type Open-Domain Question Answering","authors":"Ho-Ching Yen, Tianyu Gao, Jinhyuk Lee, Danqi Chen","doi":"10.18653/v1/2023.dialdoc-1.2","DOIUrl":null,"url":null,"abstract":"Previous research on open-domain question answering (QA) mainly focuses on questions with short answers. However, information-seeking QA often requires various formats of answers depending on the nature of the questions, e.g., why/how questions typically require a long answer. In this paper, we present MoQA, a benchmark for open-domain QA that requires building one system that can provide short, medium, long, and yes/no answers to different questions accordingly. MoQA builds upon Natural Questions with multiple types of questions and additional crowdsourcing efforts to ensure high query quality. We adapt state-of-the-art models, and reveal unique findings in multi-type open-domain QA: (1) For retriever-reader models, training one retriever on all types achieves the overall best performance, but it is challenging to train one reader model to output answers of different formats, or to train a question classifier to distinguish between types; (2) An end-to-end closed-book QA model trained on multiple types struggles with the task across the board; (3) State-of-the-art large language models such as the largest GPT-3 models (Brown et al., 2020; Ouyang et al., 2022) also lag behind open-book QA models. Our benchmark and analysis call for more effort into building versatile open-domain QA models in the future.","PeriodicalId":190893,"journal":{"name":"Proceedings of the Third DialDoc Workshop on Document-grounded Dialogue and Conversational Question Answering","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Third DialDoc Workshop on Document-grounded Dialogue and Conversational Question Answering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/2023.dialdoc-1.2","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Previous research on open-domain question answering (QA) mainly focuses on questions with short answers. However, information-seeking QA often requires various formats of answers depending on the nature of the questions, e.g., why/how questions typically require a long answer. In this paper, we present MoQA, a benchmark for open-domain QA that requires building one system that can provide short, medium, long, and yes/no answers to different questions accordingly. MoQA builds upon Natural Questions with multiple types of questions and additional crowdsourcing efforts to ensure high query quality. We adapt state-of-the-art models, and reveal unique findings in multi-type open-domain QA: (1) For retriever-reader models, training one retriever on all types achieves the overall best performance, but it is challenging to train one reader model to output answers of different formats, or to train a question classifier to distinguish between types; (2) An end-to-end closed-book QA model trained on multiple types struggles with the task across the board; (3) State-of-the-art large language models such as the largest GPT-3 models (Brown et al., 2020; Ouyang et al., 2022) also lag behind open-book QA models. Our benchmark and analysis call for more effort into building versatile open-domain QA models in the future.
MoQA:对标多类型开放领域问答
以往对开放域问答(open domain question answer,简称QA)的研究主要集中在短答案问题上。然而,寻求信息的QA通常需要不同格式的答案,这取决于问题的性质,例如,为什么/如何的问题通常需要很长的答案。在本文中,我们提出了MoQA,这是一个开放领域QA的基准,它要求构建一个系统,该系统可以相应地提供短、中、长以及对不同问题的是/否答案。MoQA建立在自然问题的基础上,具有多种类型的问题和额外的众包努力,以确保高查询质量。我们采用了最先进的模型,并在多类型开放域QA中揭示了独特的发现:(1)对于检索器-读者模型,在所有类型上训练一只检索器达到了整体最佳性能,但训练一个读者模型输出不同格式的答案,或者训练一个问题分类器区分不同类型是具有挑战性的;(2)经过多种类型训练的端到端闭卷QA模型在全面任务中挣扎;(3)最先进的大型语言模型,如最大的GPT-3模型(Brown et al., 2020;欧阳等人,2022)也落后于开卷QA模型。我们的基准测试和分析要求在未来更多地努力构建通用的开放领域QA模型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信