基于自回归多项式Logit和C5.0决策树的面板数据多类预测

IF 1.1 Q3 STATISTICS & PROBABILITY

Pakistan Journal of Statistics and Operation Research Pub Date : 2023-03-06 DOI:10.18187/pjsor.v19i1.4053

Muhlis Ardiansyah, Hari Wijayanto, Anang Kurnia, A. Djuraidah

{"title":"基于自回归多项式Logit和C5.0决策树的面板数据多类预测","authors":"Muhlis Ardiansyah, Hari Wijayanto, Anang Kurnia, A. Djuraidah","doi":"10.18187/pjsor.v19i1.4053","DOIUrl":null,"url":null,"abstract":"Panel data is commonly used for the numerical response variables, while the literature for forecasting categorical variables on the panel data structure is still challenging to find. Forecasting is important because it is helpful for government policies. This study aimed to forecast multiclass or categorical variables on the panel data structure. The proposed forecasting models were autoregressive multinomial logit and autoregressive C5.0. The strategy applied so that the two models could be used for forecasting was to add autoregressive effects and fixed predictor variables such as location, time, strata, and month of observations. The autoregressive effect was assumed to be a fixed effect and treated as a dummy variable. The data used was the category of land conditions through The Area Sampling Frame (ASF) survey conducted by the BPS-Statistics Indonesia. The evaluation of both models was based on classification and forecasting performance. Classification performance was obtained by dividing the dataset into 75% training data for modeling and 25% test data for validation and then repeated 200 times. The classification results showed that the autoregressive C5.0 accuracy was 86.48%, while the autoregressive multinomial logit was 83.97%. A comparison of forecasting performance was obtained by dividing the data into training and testing based on the time sequence. The result showed that the forecasting performance was worse than the classification performance. Autoregressive C5.0 had an accuracy of 77.43%, while autoregressive multinomial logit had 77.77%.","PeriodicalId":19973,"journal":{"name":"Pakistan Journal of Statistics and Operation Research","volume":" ","pages":""},"PeriodicalIF":1.1000,"publicationDate":"2023-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Multiclass Forecasting on Panel Data Using Autoregressive Multinomial Logit and C5.0 Decision Tree\",\"authors\":\"Muhlis Ardiansyah, Hari Wijayanto, Anang Kurnia, A. Djuraidah\",\"doi\":\"10.18187/pjsor.v19i1.4053\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Panel data is commonly used for the numerical response variables, while the literature for forecasting categorical variables on the panel data structure is still challenging to find. Forecasting is important because it is helpful for government policies. This study aimed to forecast multiclass or categorical variables on the panel data structure. The proposed forecasting models were autoregressive multinomial logit and autoregressive C5.0. The strategy applied so that the two models could be used for forecasting was to add autoregressive effects and fixed predictor variables such as location, time, strata, and month of observations. The autoregressive effect was assumed to be a fixed effect and treated as a dummy variable. The data used was the category of land conditions through The Area Sampling Frame (ASF) survey conducted by the BPS-Statistics Indonesia. The evaluation of both models was based on classification and forecasting performance. Classification performance was obtained by dividing the dataset into 75% training data for modeling and 25% test data for validation and then repeated 200 times. The classification results showed that the autoregressive C5.0 accuracy was 86.48%, while the autoregressive multinomial logit was 83.97%. A comparison of forecasting performance was obtained by dividing the data into training and testing based on the time sequence. The result showed that the forecasting performance was worse than the classification performance. Autoregressive C5.0 had an accuracy of 77.43%, while autoregressive multinomial logit had 77.77%.\",\"PeriodicalId\":19973,\"journal\":{\"name\":\"Pakistan Journal of Statistics and Operation Research\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":1.1000,\"publicationDate\":\"2023-03-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Pakistan Journal of Statistics and Operation Research\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.18187/pjsor.v19i1.4053\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"STATISTICS & PROBABILITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pakistan Journal of Statistics and Operation Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18187/pjsor.v19i1.4053","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}

引用次数: 1

摘要

面板数据通常用于数值响应变量，而在面板数据结构上预测分类变量的文献仍然很难找到。预测很重要，因为它有助于政府政策。本研究旨在预测面板数据结构上的多类别或分类变量。所提出的预测模型为自回归多项式logit和自回归C5.0。使这两个模型可用于预测的策略是添加自回归效应和固定的预测变量，如位置、时间、地层和观测月份。自回归效应被假设为固定效应，并被视为伪变量。使用的数据是通过BPS印尼统计局进行的区域抽样框架（ASF）调查得出的土地状况类别。这两个模型的评估都是基于分类和预测性能。分类性能是通过将数据集划分为75%的训练数据用于建模和25%的测试数据用于验证来获得的，然后重复200次。分类结果表明，自回归C5.0的准确率为86.48%，而自回归多项式logit为83.97%。通过根据时间序列将数据划分为训练和测试，获得了预测性能的比较。结果表明，预测性能比分类性能差。自回归C5.0的准确率为77.43%，而自回归多项式logit的准确率则为77.77%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Multiclass Forecasting on Panel Data Using Autoregressive Multinomial Logit and C5.0 Decision Tree

Panel data is commonly used for the numerical response variables, while the literature for forecasting categorical variables on the panel data structure is still challenging to find. Forecasting is important because it is helpful for government policies. This study aimed to forecast multiclass or categorical variables on the panel data structure. The proposed forecasting models were autoregressive multinomial logit and autoregressive C5.0. The strategy applied so that the two models could be used for forecasting was to add autoregressive effects and fixed predictor variables such as location, time, strata, and month of observations. The autoregressive effect was assumed to be a fixed effect and treated as a dummy variable. The data used was the category of land conditions through The Area Sampling Frame (ASF) survey conducted by the BPS-Statistics Indonesia. The evaluation of both models was based on classification and forecasting performance. Classification performance was obtained by dividing the dataset into 75% training data for modeling and 25% test data for validation and then repeated 200 times. The classification results showed that the autoregressive C5.0 accuracy was 86.48%, while the autoregressive multinomial logit was 83.97%. A comparison of forecasting performance was obtained by dividing the data into training and testing based on the time sequence. The result showed that the forecasting performance was worse than the classification performance. Autoregressive C5.0 had an accuracy of 77.43%, while autoregressive multinomial logit had 77.77%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Pakistan Journal of Statistics and Operation Research STATISTICS & PROBABILITY-

CiteScore

3.30

自引率

26.70%

发文量

期刊介绍： Pakistan Journal of Statistics and Operation Research. PJSOR is a peer-reviewed journal, published four times a year. PJSOR publishes refereed research articles and studies that describe the latest research and developments in the area of statistics, operation research and actuarial statistics.