基于强化学习的开放域复杂问题答案选择

International Symposium on Electrical, Electronics and Information Engineering Pub Date : 1900-01-01 DOI:10.1145/3459104.3459149

Angel Felipe Magnossão de Paula, Roberto Fray da Silva, B. Nishimoto, C. Cugnasca, Anna Helena Reali Costa

{"title":"基于强化学习的开放域复杂问题答案选择","authors":"Angel Felipe Magnossão de Paula, Roberto Fray da Silva, B. Nishimoto, C. Cugnasca, Anna Helena Reali Costa","doi":"10.1145/3459104.3459149","DOIUrl":null,"url":null,"abstract":"Multiple-choice question answering for the open domain is a task that consists of answering challenging questions from multiple domains, without direct pieces of evidence in the text corpora. The main application of multiple-choice question answering is self-tutoring. We propose the Multiple-Choice Reinforcement Learner (MCRL) model, which uses a policy gradient algorithm in a partially observable Markov decision process to reformulate question-answer pairs in order to find new pieces of evidence to support each answer choice. Its inputs are the question and the answer choices. MCRL learns to generate queries that improve the evidence found for each answer choice, using iteration cycles. After a predefined number of iteration cycles, MCRL provides the best answer choice and the text passages that support it. We use accuracy and mean reward per episode to conduct an in-depth hyperparameter analysis of the number of iteration cycles, reward function design, and weight of the pieces of evidence found in each iteration cycle on the final answer choice. The MCRL model with the best performance reached an accuracy of 0.346, a value higher than naive, random, and the traditional end-to-end deep learning QA models. We conclude with recommendations for future developments of the model, which can be adapted for different languages using text corpora and word embedding models for each language.","PeriodicalId":322229,"journal":{"name":"International Symposium on Electrical, Electronics and Information Engineering","volume":"57 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Answer Selection Using Reinforcement Learning for Complex Question Answering on the Open Domain\",\"authors\":\"Angel Felipe Magnossão de Paula, Roberto Fray da Silva, B. Nishimoto, C. Cugnasca, Anna Helena Reali Costa\",\"doi\":\"10.1145/3459104.3459149\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Multiple-choice question answering for the open domain is a task that consists of answering challenging questions from multiple domains, without direct pieces of evidence in the text corpora. The main application of multiple-choice question answering is self-tutoring. We propose the Multiple-Choice Reinforcement Learner (MCRL) model, which uses a policy gradient algorithm in a partially observable Markov decision process to reformulate question-answer pairs in order to find new pieces of evidence to support each answer choice. Its inputs are the question and the answer choices. MCRL learns to generate queries that improve the evidence found for each answer choice, using iteration cycles. After a predefined number of iteration cycles, MCRL provides the best answer choice and the text passages that support it. We use accuracy and mean reward per episode to conduct an in-depth hyperparameter analysis of the number of iteration cycles, reward function design, and weight of the pieces of evidence found in each iteration cycle on the final answer choice. The MCRL model with the best performance reached an accuracy of 0.346, a value higher than naive, random, and the traditional end-to-end deep learning QA models. We conclude with recommendations for future developments of the model, which can be adapted for different languages using text corpora and word embedding models for each language.\",\"PeriodicalId\":322229,\"journal\":{\"name\":\"International Symposium on Electrical, Electronics and Information Engineering\",\"volume\":\"57 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Symposium on Electrical, Electronics and Information Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3459104.3459149\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Symposium on Electrical, Electronics and Information Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3459104.3459149","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Answer Selection Using Reinforcement Learning for Complex Question Answering on the Open Domain

Multiple-choice question answering for the open domain is a task that consists of answering challenging questions from multiple domains, without direct pieces of evidence in the text corpora. The main application of multiple-choice question answering is self-tutoring. We propose the Multiple-Choice Reinforcement Learner (MCRL) model, which uses a policy gradient algorithm in a partially observable Markov decision process to reformulate question-answer pairs in order to find new pieces of evidence to support each answer choice. Its inputs are the question and the answer choices. MCRL learns to generate queries that improve the evidence found for each answer choice, using iteration cycles. After a predefined number of iteration cycles, MCRL provides the best answer choice and the text passages that support it. We use accuracy and mean reward per episode to conduct an in-depth hyperparameter analysis of the number of iteration cycles, reward function design, and weight of the pieces of evidence found in each iteration cycle on the final answer choice. The MCRL model with the best performance reached an accuracy of 0.346, a value higher than naive, random, and the traditional end-to-end deep learning QA models. We conclude with recommendations for future developments of the model, which can be adapted for different languages using text corpora and word embedding models for each language.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Symposium on Electrical, Electronics and Information Engineering

自引率

0.00%

发文量