Fine-Tuning IndoBERT for Indonesian Exam Question Classification Based on Bloom's Taxonomy

Journal of Information Systems Engineering and Business Intelligence Pub Date : 2023-11-01 DOI:10.20473/jisebi.9.2.253-263

Fikri Baharuddin, Mohammad Farid Naufal

{"title":"Fine-Tuning IndoBERT for Indonesian Exam Question Classification Based on Bloom's Taxonomy","authors":"Fikri Baharuddin, Mohammad Farid Naufal","doi":"10.20473/jisebi.9.2.253-263","DOIUrl":null,"url":null,"abstract":"Background: The learning assessment of elementary schools has recently incorporated Bloom's Taxonomy, a structure in education that categorizes different levels of cognitive learning and thinking skills, as a fundamental framework. This assessment now includes High Order Thinking Skill (HOTS) questions, with a specific focus on Indonesian topics. The implementation of this system has been observed to require teachers to manually categorize or classify questions, and this process typically requires more time and resources. To address the associated difficulty, automated categorization and classification are required to streamline the process. However, despite various research efforts in questions classification, there is still room for improvement in terms of performance, particularly in precision and accuracy. Numerous investigations have explored the use of Deep Learning Natural Language Processing models such as BERT for classification, and IndoBERT is one such pre-trained model for text analysis. Objective: This research aims to build classification system that is capable of classifying Indonesian exam questions in multiple-choice form based on Bloom's Taxonomy using IndoBERT pre-trained model. Methods: The methodology used includes hyperparameter fine-tuning, which was carried out to identify the optimal model performance. This performance was subsequently evaluated based on accuracy, F1 Score, Precision, Recall, and the time required for the training and validation of the model. Results: The proposed Fine Tuned IndoBERT Model showed that the accuracy rate was 97%, 97% F1 Score, 97% Recall, and 98% Precision with an average training time per epoch of 1.55 seconds and an average validation time per epoch of 0.38 seconds. Conclusion: Fine Tuned IndoBERT model was observed to have a relatively high classification performance, and based on this observation, the system was considered capable of classifying Indonesian exam questions at the elementary school level. Keywords: IndoBERT, Fine Tuning, Indonesian Exam Question, Model Classifier, Natural Language Processing, Bloom’s Taxonomy","PeriodicalId":16185,"journal":{"name":"Journal of Information Systems Engineering and Business Intelligence","volume":"100 2","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Information Systems Engineering and Business Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.20473/jisebi.9.2.253-263","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Background: The learning assessment of elementary schools has recently incorporated Bloom's Taxonomy, a structure in education that categorizes different levels of cognitive learning and thinking skills, as a fundamental framework. This assessment now includes High Order Thinking Skill (HOTS) questions, with a specific focus on Indonesian topics. The implementation of this system has been observed to require teachers to manually categorize or classify questions, and this process typically requires more time and resources. To address the associated difficulty, automated categorization and classification are required to streamline the process. However, despite various research efforts in questions classification, there is still room for improvement in terms of performance, particularly in precision and accuracy. Numerous investigations have explored the use of Deep Learning Natural Language Processing models such as BERT for classification, and IndoBERT is one such pre-trained model for text analysis. Objective: This research aims to build classification system that is capable of classifying Indonesian exam questions in multiple-choice form based on Bloom's Taxonomy using IndoBERT pre-trained model. Methods: The methodology used includes hyperparameter fine-tuning, which was carried out to identify the optimal model performance. This performance was subsequently evaluated based on accuracy, F1 Score, Precision, Recall, and the time required for the training and validation of the model. Results: The proposed Fine Tuned IndoBERT Model showed that the accuracy rate was 97%, 97% F1 Score, 97% Recall, and 98% Precision with an average training time per epoch of 1.55 seconds and an average validation time per epoch of 0.38 seconds. Conclusion: Fine Tuned IndoBERT model was observed to have a relatively high classification performance, and based on this observation, the system was considered capable of classifying Indonesian exam questions at the elementary school level. Keywords: IndoBERT, Fine Tuning, Indonesian Exam Question, Model Classifier, Natural Language Processing, Bloom’s Taxonomy

查看原文本刊更多论文

基于Bloom分类法的印尼语试题分类微调IndoBERT

背景:最近，小学学习评估纳入了布鲁姆分类法(Bloom’s Taxonomy)，这是一种将不同层次的认知学习和思维技能分类的教育结构，作为基本框架。该评估现在包括高阶思维技能(HOTS)问题，特别侧重于印度尼西亚主题。据观察，该系统的实施需要教师手动对问题进行分类或分类，这一过程通常需要更多的时间和资源。为了解决相关的困难，需要自动分类和分类来简化流程。然而，尽管在问题分类方面进行了各种各样的研究，但在性能方面，特别是在精密度和准确性方面，仍有提高的空间。许多研究已经探索了使用深度学习自然语言处理模型(如BERT)进行分类，而IndoBERT就是这样一个用于文本分析的预训练模型。目的:利用IndoBERT预训练模型，构建基于Bloom分类法的印尼语选择题分类系统。方法:采用的方法包括超参数微调，以确定最优的模型性能。该性能随后根据准确性、F1分数、精度、召回率以及模型训练和验证所需的时间进行评估。结果:提出的微调IndoBERT模型准确率为97%，F1 Score为97%，Recall为97%，Precision为98%，平均训练时间为1.55秒，平均验证时间为0.38秒。结论:观察到Fine Tuned IndoBERT模型具有较高的分类性能，基于这一观察，认为该系统能够对小学水平的印尼语试题进行分类。关键词:IndoBERT，微调，印尼语试题，模型分类器，自然语言处理，Bloom分类法

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Information Systems Engineering and Business Intelligence

CiteScore

0.30

自引率

0.00%

发文量