Understanding and Improving Zero-shot Multi-hop Reasoning in Generative Question Answering

Proceedings of COLING. International Conference on Computational Linguistics Pub Date : 2022-10-09 DOI:10.48550/arXiv.2210.04234

Zhengbao Jiang, J. Araki, Haibo Ding, Graham Neubig

{"title":"Understanding and Improving Zero-shot Multi-hop Reasoning in Generative Question Answering","authors":"Zhengbao Jiang, J. Araki, Haibo Ding, Graham Neubig","doi":"10.48550/arXiv.2210.04234","DOIUrl":null,"url":null,"abstract":"Generative question answering (QA) models generate answers to questions either solely based on the parameters of the model (the closed-book setting) or additionally retrieving relevant evidence (the open-book setting). Generative QA models can answer some relatively complex questions, but the mechanism through which they do so is still poorly understood. We perform several studies aimed at better understanding the multi-hop reasoning capabilities of generative QA models. First, we decompose multi-hop questions into multiple corresponding single-hop questions, and find marked inconsistency in QA models’ answers on these pairs of ostensibly identical question chains. Second, we find that models lack zero-shot multi-hop reasoning ability: when trained only on single-hop questions, models generalize poorly to multi-hop questions. Finally, we demonstrate that it is possible to improve models’ zero-shot multi-hop reasoning capacity through two methods that approximate real multi-hop natural language (NL) questions by training on either concatenation of single-hop questions or logical forms (SPARQL). In sum, these results demonstrate that multi-hop reasoning does not emerge naturally in generative QA models, but can be encouraged by advances in training or modeling techniques. Code is available at https://github.com/jzbjyb/multihop.","PeriodicalId":91381,"journal":{"name":"Proceedings of COLING. International Conference on Computational Linguistics","volume":"1 1","pages":"1765-1775"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of COLING. International Conference on Computational Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2210.04234","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Generative question answering (QA) models generate answers to questions either solely based on the parameters of the model (the closed-book setting) or additionally retrieving relevant evidence (the open-book setting). Generative QA models can answer some relatively complex questions, but the mechanism through which they do so is still poorly understood. We perform several studies aimed at better understanding the multi-hop reasoning capabilities of generative QA models. First, we decompose multi-hop questions into multiple corresponding single-hop questions, and find marked inconsistency in QA models’ answers on these pairs of ostensibly identical question chains. Second, we find that models lack zero-shot multi-hop reasoning ability: when trained only on single-hop questions, models generalize poorly to multi-hop questions. Finally, we demonstrate that it is possible to improve models’ zero-shot multi-hop reasoning capacity through two methods that approximate real multi-hop natural language (NL) questions by training on either concatenation of single-hop questions or logical forms (SPARQL). In sum, these results demonstrate that multi-hop reasoning does not emerge naturally in generative QA models, but can be encouraged by advances in training or modeling techniques. Code is available at https://github.com/jzbjyb/multihop.

查看原文本刊更多论文

生成式问答中零次多跳推理的理解与改进

生成式问答(QA)模型生成问题的答案，要么完全基于模型的参数(闭卷设置)，要么额外检索相关证据(开卷设置)。生成式QA模型可以回答一些相对复杂的问题，但人们对其机制仍然知之甚少。我们进行了几项研究，旨在更好地理解生成QA模型的多跳推理能力。首先，我们将多跳问题分解为多个相应的单跳问题，并在这些表面上相同的问题链上发现QA模型的答案存在明显的不一致性。其次，我们发现模型缺乏零跳多跳推理能力:当只对单跳问题进行训练时，模型对多跳问题的泛化能力较差。最后，我们证明了有可能通过两种方法来提高模型的零跳多推理能力，这两种方法通过训练单跳问题的串联或逻辑形式(SPARQL)来近似真实的多跳自然语言(NL)问题。总而言之，这些结果表明，多跳推理不会在生成式QA模型中自然出现，但可以通过训练或建模技术的进步来鼓励。代码可从https://github.com/jzbjyb/multihop获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of COLING. International Conference on Computational Linguistics

自引率

0.00%

发文量