Multiclass Forecasting on Panel Data Using Autoregressive Multinomial Logit and C5.0 Decision Tree

IF 1.1 Q3 STATISTICS & PROBABILITY
Muhlis Ardiansyah, Hari Wijayanto, Anang Kurnia, A. Djuraidah
{"title":"Multiclass Forecasting on Panel Data Using Autoregressive Multinomial Logit and C5.0 Decision Tree","authors":"Muhlis Ardiansyah, Hari Wijayanto, Anang Kurnia, A. Djuraidah","doi":"10.18187/pjsor.v19i1.4053","DOIUrl":null,"url":null,"abstract":"Panel data is commonly used for the numerical response variables, while the literature for forecasting categorical variables on the panel data structure is still challenging to find. Forecasting is important because it is helpful for government policies. This study aimed to forecast multiclass or categorical variables on the panel data structure. The proposed forecasting models were autoregressive multinomial logit and autoregressive C5.0. The strategy applied so that the two models could be used for forecasting was to add autoregressive effects and fixed predictor variables such as location, time, strata, and month of observations. The autoregressive effect  was assumed to be a fixed effect and treated as a dummy variable. The data used was the category of land conditions through The Area Sampling Frame (ASF) survey conducted by the BPS-Statistics Indonesia. The evaluation of both models was based on classification and forecasting performance. Classification performance was obtained by dividing the dataset into 75% training data for modeling and 25% test data for validation and then repeated 200 times. The classification results showed that the autoregressive C5.0 accuracy was 86.48%, while the autoregressive multinomial logit was 83.97%. A comparison of forecasting performance was obtained by dividing the data into training and testing based on the time sequence. The result showed that the forecasting performance was worse than the classification performance. Autoregressive C5.0 had an accuracy of 77.43%, while autoregressive multinomial logit had 77.77%.","PeriodicalId":19973,"journal":{"name":"Pakistan Journal of Statistics and Operation Research","volume":null,"pages":null},"PeriodicalIF":1.1000,"publicationDate":"2023-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pakistan Journal of Statistics and Operation Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18187/pjsor.v19i1.4053","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 1

Abstract

Panel data is commonly used for the numerical response variables, while the literature for forecasting categorical variables on the panel data structure is still challenging to find. Forecasting is important because it is helpful for government policies. This study aimed to forecast multiclass or categorical variables on the panel data structure. The proposed forecasting models were autoregressive multinomial logit and autoregressive C5.0. The strategy applied so that the two models could be used for forecasting was to add autoregressive effects and fixed predictor variables such as location, time, strata, and month of observations. The autoregressive effect  was assumed to be a fixed effect and treated as a dummy variable. The data used was the category of land conditions through The Area Sampling Frame (ASF) survey conducted by the BPS-Statistics Indonesia. The evaluation of both models was based on classification and forecasting performance. Classification performance was obtained by dividing the dataset into 75% training data for modeling and 25% test data for validation and then repeated 200 times. The classification results showed that the autoregressive C5.0 accuracy was 86.48%, while the autoregressive multinomial logit was 83.97%. A comparison of forecasting performance was obtained by dividing the data into training and testing based on the time sequence. The result showed that the forecasting performance was worse than the classification performance. Autoregressive C5.0 had an accuracy of 77.43%, while autoregressive multinomial logit had 77.77%.
基于自回归多项式Logit和C5.0决策树的面板数据多类预测
面板数据通常用于数值响应变量,而在面板数据结构上预测分类变量的文献仍然很难找到。预测很重要,因为它有助于政府政策。本研究旨在预测面板数据结构上的多类别或分类变量。所提出的预测模型为自回归多项式logit和自回归C5.0。使这两个模型可用于预测的策略是添加自回归效应和固定的预测变量,如位置、时间、地层和观测月份。自回归效应被假设为固定效应,并被视为伪变量。使用的数据是通过BPS印尼统计局进行的区域抽样框架(ASF)调查得出的土地状况类别。这两个模型的评估都是基于分类和预测性能。分类性能是通过将数据集划分为75%的训练数据用于建模和25%的测试数据用于验证来获得的,然后重复200次。分类结果表明,自回归C5.0的准确率为86.48%,而自回归多项式logit为83.97%。通过根据时间序列将数据划分为训练和测试,获得了预测性能的比较。结果表明,预测性能比分类性能差。自回归C5.0的准确率为77.43%,而自回归多项式logit的准确率则为77.77%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
3.30
自引率
26.70%
发文量
53
期刊介绍: Pakistan Journal of Statistics and Operation Research. PJSOR is a peer-reviewed journal, published four times a year. PJSOR publishes refereed research articles and studies that describe the latest research and developments in the area of statistics, operation research and actuarial statistics.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信