Open Domain Machine Reading Comprehension using InferSent

Korean Institute of Smart Media Pub Date : 2022-11-30 DOI:10.30693/smj.2022.11.10.89

J. Kim, Chun-Bo Sim, Junyeong Kim, Jun Park, S. Park, S. Jung

{"title":"Open Domain Machine Reading Comprehension using InferSent","authors":"J. Kim, Chun-Bo Sim, Junyeong Kim, Jun Park, S. Park, S. Jung","doi":"10.30693/smj.2022.11.10.89","DOIUrl":null,"url":null,"abstract":"An open domain machine reading comprehension is a model that adds a function to search paragraphs as there are no paragraphs related to a given question. Document searches have an issue of lower performance with a lot of documents despite abundant research with word frequency based TF-IDF. Paragraph selections also have an issue of not extracting paragraph contexts, including sentence characteristics accurately despite a lot of research with word-based embedding. Document reading comprehension has an issue of slow learning due to the growing number of parameters despite a lot of research on BERT. Trying to solve these three issues, this study used BM25 which considered even sentence length and InferSent to get sentence contexts, and proposed an open domain machine reading comprehension with ALBERT to reduce the number of parameters. An experiment was conducted with SQuAD1.1 datasets. BM25 recorded a higher performance of document research than TF-IDF by 3.2%. InferSent showed a higher performance in paragraph selection than Transformer by 0.9%. Finally, as the number of paragraphs increased in document comprehension, ALBERT was 0.4% higher in EM and 0.2% higher in F1.","PeriodicalId":249252,"journal":{"name":"Korean Institute of Smart Media","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Korean Institute of Smart Media","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.30693/smj.2022.11.10.89","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

An open domain machine reading comprehension is a model that adds a function to search paragraphs as there are no paragraphs related to a given question. Document searches have an issue of lower performance with a lot of documents despite abundant research with word frequency based TF-IDF. Paragraph selections also have an issue of not extracting paragraph contexts, including sentence characteristics accurately despite a lot of research with word-based embedding. Document reading comprehension has an issue of slow learning due to the growing number of parameters despite a lot of research on BERT. Trying to solve these three issues, this study used BM25 which considered even sentence length and InferSent to get sentence contexts, and proposed an open domain machine reading comprehension with ALBERT to reduce the number of parameters. An experiment was conducted with SQuAD1.1 datasets. BM25 recorded a higher performance of document research than TF-IDF by 3.2%. InferSent showed a higher performance in paragraph selection than Transformer by 0.9%. Finally, as the number of paragraphs increased in document comprehension, ALBERT was 0.4% higher in EM and 0.2% higher in F1.

查看原文本刊更多论文

使用InferSent的开放域机器阅读理解

开放域机器阅读理解是一个模型，当没有与给定问题相关的段落时，它增加了搜索段落的功能。尽管对基于词频的TF-IDF进行了大量研究，但对于大量文档的文档搜索存在性能较低的问题。段落选择也存在不能准确提取段落上下文(包括句子特征)的问题，尽管有很多基于词的嵌入研究。尽管对BERT进行了大量的研究，但由于参数数量的增加，文档阅读理解存在学习缓慢的问题。为了解决这三个问题，本研究使用了考虑均匀句子长度的BM25模型和基于语义的InferSent模型来获取句子上下文，并提出了一种基于ALBERT的开放域机器阅读理解方法来减少参数的数量。实验采用了squaw1.1数据集。BM25在文献研究方面的表现比TF-IDF高3.2%。InferSent在段落选择方面的性能比Transformer高0.9%。最后，随着文档理解段落数的增加，ALBERT在EM和F1中分别提高了0.4%和0.2%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Korean Institute of Smart Media

自引率

0.00%

发文量