用于处理贫困分类中不平衡数据的合成少数民族过采样技术（SMOTE）

Q3 Decision Sciences

Statistical Journal of the IAOS Pub Date : 2022-12-25 DOI:10.3233/sji-220080

Firza Refo Adi Pratama, S. I. Oktora

{"title":"用于处理贫困分类中不平衡数据的合成少数民族过采样技术（SMOTE）","authors":"Firza Refo Adi Pratama, S. I. Oktora","doi":"10.3233/sji-220080","DOIUrl":null,"url":null,"abstract":"Poverty data in official statistics data is important for development planning. The lower percentage of the poor recorded yearly indicates good development of a country. Moreover, there is always a problem when performing an inferential and classification analysis because of the imbalanced data, thereby leading to biases in the estimation results and prediction errors in the classification. One of the solutions to this problem is using Synthetic Minority Over-sampling Technique (SMOTE). Therefore, this study aims to evaluate the inference and classification quality using the binary logistic regression model without and with SMOTE. The data utilized was the poverty status of households in the rural and urban areas in East Java, Indonesia as contained in the 2019 National Socio-Economic Survey. Furthermore, the variables used are poverty status of the household, the age of the household head (HH), the ratio of household members who are employed, gender of the HH, number of household members, education level of HH, and occupation of the HH. It was concluded that the model with SMOTE approach was better at inference and classifying the results.","PeriodicalId":55877,"journal":{"name":"Statistical Journal of the IAOS","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Synthetic Minority Over-sampling Technique (SMOTE) for handling imbalanced data in poverty classification\",\"authors\":\"Firza Refo Adi Pratama, S. I. Oktora\",\"doi\":\"10.3233/sji-220080\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Poverty data in official statistics data is important for development planning. The lower percentage of the poor recorded yearly indicates good development of a country. Moreover, there is always a problem when performing an inferential and classification analysis because of the imbalanced data, thereby leading to biases in the estimation results and prediction errors in the classification. One of the solutions to this problem is using Synthetic Minority Over-sampling Technique (SMOTE). Therefore, this study aims to evaluate the inference and classification quality using the binary logistic regression model without and with SMOTE. The data utilized was the poverty status of households in the rural and urban areas in East Java, Indonesia as contained in the 2019 National Socio-Economic Survey. Furthermore, the variables used are poverty status of the household, the age of the household head (HH), the ratio of household members who are employed, gender of the HH, number of household members, education level of HH, and occupation of the HH. It was concluded that the model with SMOTE approach was better at inference and classifying the results.\",\"PeriodicalId\":55877,\"journal\":{\"name\":\"Statistical Journal of the IAOS\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Statistical Journal of the IAOS\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3233/sji-220080\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Decision Sciences\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistical Journal of the IAOS","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3233/sji-220080","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Decision Sciences","Score":null,"Total":0}

引用次数: 0

摘要

官方统计数据中的贫困数据对发展规划很重要。每年记录的贫困人口比例较低，表明一个国家发展良好。此外，由于数据不平衡，在进行推断和分类分析时总是存在问题，从而导致估计结果中的偏差和分类中的预测误差。这个问题的解决方案之一是使用合成少数过采样技术（SMOTE）。因此，本研究旨在使用无SMOTE和有SMOTE的二元逻辑回归模型来评估推理和分类质量。所使用的数据是2019年全国社会经济调查中包含的印度尼西亚东爪哇农村和城市地区家庭的贫困状况。此外，使用的变量包括家庭的贫困状况、户主年龄、家庭成员的就业比例、家庭成员性别、家庭成员数量、家庭教育水平和家庭职业。结果表明，采用SMOTE方法的模型在推理和分类结果方面有较好的效果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Synthetic Minority Over-sampling Technique (SMOTE) for handling imbalanced data in poverty classification

Poverty data in official statistics data is important for development planning. The lower percentage of the poor recorded yearly indicates good development of a country. Moreover, there is always a problem when performing an inferential and classification analysis because of the imbalanced data, thereby leading to biases in the estimation results and prediction errors in the classification. One of the solutions to this problem is using Synthetic Minority Over-sampling Technique (SMOTE). Therefore, this study aims to evaluate the inference and classification quality using the binary logistic regression model without and with SMOTE. The data utilized was the poverty status of households in the rural and urban areas in East Java, Indonesia as contained in the 2019 National Socio-Economic Survey. Furthermore, the variables used are poverty status of the household, the age of the household head (HH), the ratio of household members who are employed, gender of the HH, number of household members, education level of HH, and occupation of the HH. It was concluded that the model with SMOTE approach was better at inference and classifying the results.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Statistical Journal of the IAOS Economics, Econometrics and Finance-Economics and Econometrics

CiteScore

1.30

自引率

0.00%

发文量

116

期刊介绍： This is the flagship journal of the International Association for Official Statistics and is expected to be widely circulated and subscribed to by individuals and institutions in all parts of the world. The main aim of the Journal is to support the IAOS mission by publishing articles to promote the understanding and advancement of official statistics and to foster the development of effective and efficient official statistical services on a global basis. Papers are expected to be of wide interest to readers. Such papers may or may not contain strictly original material. All papers are refereed.