Open Domain Machine Reading Comprehension using InferSent

J. Kim, Chun-Bo Sim, Junyeong Kim, Jun Park, S. Park, S. Jung
{"title":"Open Domain Machine Reading Comprehension using InferSent","authors":"J. Kim, Chun-Bo Sim, Junyeong Kim, Jun Park, S. Park, S. Jung","doi":"10.30693/smj.2022.11.10.89","DOIUrl":null,"url":null,"abstract":"An open domain machine reading comprehension is a model that adds a function to search paragraphs as there are no paragraphs related to a given question. Document searches have an issue of lower performance with a lot of documents despite abundant research with word frequency based TF-IDF. Paragraph selections also have an issue of not extracting paragraph contexts, including sentence characteristics accurately despite a lot of research with word-based embedding. Document reading comprehension has an issue of slow learning due to the growing number of parameters despite a lot of research on BERT. Trying to solve these three issues, this study used BM25 which considered even sentence length and InferSent to get sentence contexts, and proposed an open domain machine reading comprehension with ALBERT to reduce the number of parameters. An experiment was conducted with SQuAD1.1 datasets. BM25 recorded a higher performance of document research than TF-IDF by 3.2%. InferSent showed a higher performance in paragraph selection than Transformer by 0.9%. Finally, as the number of paragraphs increased in document comprehension, ALBERT was 0.4% higher in EM and 0.2% higher in F1.","PeriodicalId":249252,"journal":{"name":"Korean Institute of Smart Media","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Korean Institute of Smart Media","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.30693/smj.2022.11.10.89","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

An open domain machine reading comprehension is a model that adds a function to search paragraphs as there are no paragraphs related to a given question. Document searches have an issue of lower performance with a lot of documents despite abundant research with word frequency based TF-IDF. Paragraph selections also have an issue of not extracting paragraph contexts, including sentence characteristics accurately despite a lot of research with word-based embedding. Document reading comprehension has an issue of slow learning due to the growing number of parameters despite a lot of research on BERT. Trying to solve these three issues, this study used BM25 which considered even sentence length and InferSent to get sentence contexts, and proposed an open domain machine reading comprehension with ALBERT to reduce the number of parameters. An experiment was conducted with SQuAD1.1 datasets. BM25 recorded a higher performance of document research than TF-IDF by 3.2%. InferSent showed a higher performance in paragraph selection than Transformer by 0.9%. Finally, as the number of paragraphs increased in document comprehension, ALBERT was 0.4% higher in EM and 0.2% higher in F1.
使用InferSent的开放域机器阅读理解
开放域机器阅读理解是一个模型,当没有与给定问题相关的段落时,它增加了搜索段落的功能。尽管对基于词频的TF-IDF进行了大量研究,但对于大量文档的文档搜索存在性能较低的问题。段落选择也存在不能准确提取段落上下文(包括句子特征)的问题,尽管有很多基于词的嵌入研究。尽管对BERT进行了大量的研究,但由于参数数量的增加,文档阅读理解存在学习缓慢的问题。为了解决这三个问题,本研究使用了考虑均匀句子长度的BM25模型和基于语义的InferSent模型来获取句子上下文,并提出了一种基于ALBERT的开放域机器阅读理解方法来减少参数的数量。实验采用了squaw1.1数据集。BM25在文献研究方面的表现比TF-IDF高3.2%。InferSent在段落选择方面的性能比Transformer高0.9%。最后,随着文档理解段落数的增加,ALBERT在EM和F1中分别提高了0.4%和0.2%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信