Angel Felipe Magnossão de Paula, Roberto Fray da Silva, B. Nishimoto, C. Cugnasca, Anna Helena Reali Costa
{"title":"基于强化学习的开放域复杂问题答案选择","authors":"Angel Felipe Magnossão de Paula, Roberto Fray da Silva, B. Nishimoto, C. Cugnasca, Anna Helena Reali Costa","doi":"10.1145/3459104.3459149","DOIUrl":null,"url":null,"abstract":"Multiple-choice question answering for the open domain is a task that consists of answering challenging questions from multiple domains, without direct pieces of evidence in the text corpora. The main application of multiple-choice question answering is self-tutoring. We propose the Multiple-Choice Reinforcement Learner (MCRL) model, which uses a policy gradient algorithm in a partially observable Markov decision process to reformulate question-answer pairs in order to find new pieces of evidence to support each answer choice. Its inputs are the question and the answer choices. MCRL learns to generate queries that improve the evidence found for each answer choice, using iteration cycles. After a predefined number of iteration cycles, MCRL provides the best answer choice and the text passages that support it. We use accuracy and mean reward per episode to conduct an in-depth hyperparameter analysis of the number of iteration cycles, reward function design, and weight of the pieces of evidence found in each iteration cycle on the final answer choice. The MCRL model with the best performance reached an accuracy of 0.346, a value higher than naive, random, and the traditional end-to-end deep learning QA models. We conclude with recommendations for future developments of the model, which can be adapted for different languages using text corpora and word embedding models for each language.","PeriodicalId":322229,"journal":{"name":"International Symposium on Electrical, Electronics and Information Engineering","volume":"57 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Answer Selection Using Reinforcement Learning for Complex Question Answering on the Open Domain\",\"authors\":\"Angel Felipe Magnossão de Paula, Roberto Fray da Silva, B. Nishimoto, C. Cugnasca, Anna Helena Reali Costa\",\"doi\":\"10.1145/3459104.3459149\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Multiple-choice question answering for the open domain is a task that consists of answering challenging questions from multiple domains, without direct pieces of evidence in the text corpora. The main application of multiple-choice question answering is self-tutoring. We propose the Multiple-Choice Reinforcement Learner (MCRL) model, which uses a policy gradient algorithm in a partially observable Markov decision process to reformulate question-answer pairs in order to find new pieces of evidence to support each answer choice. Its inputs are the question and the answer choices. MCRL learns to generate queries that improve the evidence found for each answer choice, using iteration cycles. After a predefined number of iteration cycles, MCRL provides the best answer choice and the text passages that support it. We use accuracy and mean reward per episode to conduct an in-depth hyperparameter analysis of the number of iteration cycles, reward function design, and weight of the pieces of evidence found in each iteration cycle on the final answer choice. The MCRL model with the best performance reached an accuracy of 0.346, a value higher than naive, random, and the traditional end-to-end deep learning QA models. We conclude with recommendations for future developments of the model, which can be adapted for different languages using text corpora and word embedding models for each language.\",\"PeriodicalId\":322229,\"journal\":{\"name\":\"International Symposium on Electrical, Electronics and Information Engineering\",\"volume\":\"57 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Symposium on Electrical, Electronics and Information Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3459104.3459149\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Symposium on Electrical, Electronics and Information Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3459104.3459149","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Answer Selection Using Reinforcement Learning for Complex Question Answering on the Open Domain
Multiple-choice question answering for the open domain is a task that consists of answering challenging questions from multiple domains, without direct pieces of evidence in the text corpora. The main application of multiple-choice question answering is self-tutoring. We propose the Multiple-Choice Reinforcement Learner (MCRL) model, which uses a policy gradient algorithm in a partially observable Markov decision process to reformulate question-answer pairs in order to find new pieces of evidence to support each answer choice. Its inputs are the question and the answer choices. MCRL learns to generate queries that improve the evidence found for each answer choice, using iteration cycles. After a predefined number of iteration cycles, MCRL provides the best answer choice and the text passages that support it. We use accuracy and mean reward per episode to conduct an in-depth hyperparameter analysis of the number of iteration cycles, reward function design, and weight of the pieces of evidence found in each iteration cycle on the final answer choice. The MCRL model with the best performance reached an accuracy of 0.346, a value higher than naive, random, and the traditional end-to-end deep learning QA models. We conclude with recommendations for future developments of the model, which can be adapted for different languages using text corpora and word embedding models for each language.