Question Generator System of Sentence Completion in TOEFL Using NLP and K-Nearest Neighbor

Q1 Earth and Planetary Sciences
L. Riza, Anita Dyah Pertiwi, E. F. Rahman, M. Munir, C. U. Abdullah
{"title":"Question Generator System of Sentence Completion in TOEFL Using NLP and K-Nearest Neighbor","authors":"L. Riza, Anita Dyah Pertiwi, E. F. Rahman, M. Munir, C. U. Abdullah","doi":"10.17509/IJOST.V4I2.18202","DOIUrl":null,"url":null,"abstract":"Test of English as a Foreign Language (TOEFL) is one of learning evaluation forms that requires excellent quality of questions. Preparing TOEFL questions using a conventional way certainly spends a lot of time. Computer technology can be used to solve the problem. Therefore, this research was conducted in order to solve the problem of making TOEFL questions with sentence completion type. The built system consists of several stages: (1) input data collection from foreign media news sites with excellent English grammar quality; (2) preprocessing with Natural Language Processing (NLP); (3) Part of Speech (POS) tagging; (4) question feature extraction; (5) separation and selection of news sentences; (6) determination and value collection of seven features; (7) conversion of categorical data value; (8) target classification of blank position word with K-Nearest Neighbor (KNN); (9) heuristic determination of rules from human experts; and (10) options selection or distraction based on heuristic rules. After conducting the experiment on 10 news, it is obtained that 20 questions based on the results of the evaluation showed that the generated questions had a very good quality with percentage of 81.93% (after the assessment by the human expert), and 70% was the same blank position from the historical data of TOEFL questions. So, it can be concluded that the generated question has the following characteristics: the quality of the result follows the data training from the historical TOEFL questions, and the quality of the distraction is very good because it is derived from the heuristics of human experts.","PeriodicalId":37185,"journal":{"name":"Indonesian Journal of Science and Technology","volume":"75 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Indonesian Journal of Science and Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.17509/IJOST.V4I2.18202","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Earth and Planetary Sciences","Score":null,"Total":0}
引用次数: 9

Abstract

Test of English as a Foreign Language (TOEFL) is one of learning evaluation forms that requires excellent quality of questions. Preparing TOEFL questions using a conventional way certainly spends a lot of time. Computer technology can be used to solve the problem. Therefore, this research was conducted in order to solve the problem of making TOEFL questions with sentence completion type. The built system consists of several stages: (1) input data collection from foreign media news sites with excellent English grammar quality; (2) preprocessing with Natural Language Processing (NLP); (3) Part of Speech (POS) tagging; (4) question feature extraction; (5) separation and selection of news sentences; (6) determination and value collection of seven features; (7) conversion of categorical data value; (8) target classification of blank position word with K-Nearest Neighbor (KNN); (9) heuristic determination of rules from human experts; and (10) options selection or distraction based on heuristic rules. After conducting the experiment on 10 news, it is obtained that 20 questions based on the results of the evaluation showed that the generated questions had a very good quality with percentage of 81.93% (after the assessment by the human expert), and 70% was the same blank position from the historical data of TOEFL questions. So, it can be concluded that the generated question has the following characteristics: the quality of the result follows the data training from the historical TOEFL questions, and the quality of the distraction is very good because it is derived from the heuristics of human experts.
基于NLP和k近邻的托福句子补全问题生成系统
托福考试是一种对考题质量要求极高的学习评价形式。用传统的方法准备托福考题当然要花很多时间。计算机技术可以用来解决这个问题。因此,本研究是为了解决托福考题的句子补全型问题。构建的系统包括以下几个阶段:(1)输入国外媒体新闻网站的数据采集,具有优秀的英语语法质量;(2)利用自然语言处理(NLP)进行预处理;(3)词性标注;(4)问题特征提取;(5)新闻句子的分离和选择;(6)七个特征的确定和价值收集;(7)分类数据值转换;(8)基于k近邻(KNN)的空白位置词目标分类;(9)由人类专家启发式确定规则;(10)基于启发式规则的选项选择或分散。在对10条新闻进行实验后,得到基于评估结果的20个问题,生成的问题质量非常好,百分比为81.93%(经过人类专家评估),70%是托福考题历史数据相同的空白位置。因此,可以得出,生成的问题具有以下特点:结果的质量遵循了历史托福问题的数据训练,分心的质量非常好,因为它是由人类专家的启发式推导出来的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Indonesian Journal of Science and Technology
Indonesian Journal of Science and Technology Engineering-Engineering (all)
CiteScore
11.20
自引率
0.00%
发文量
10
审稿时长
16 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信