Synthetic Minority Over-sampling Technique (SMOTE) for handling imbalanced data in poverty classification

Q3 Decision Sciences
Firza Refo Adi Pratama, S. I. Oktora
{"title":"Synthetic Minority Over-sampling Technique (SMOTE) for handling imbalanced data in poverty classification","authors":"Firza Refo Adi Pratama, S. I. Oktora","doi":"10.3233/sji-220080","DOIUrl":null,"url":null,"abstract":"Poverty data in official statistics data is important for development planning. The lower percentage of the poor recorded yearly indicates good development of a country. Moreover, there is always a problem when performing an inferential and classification analysis because of the imbalanced data, thereby leading to biases in the estimation results and prediction errors in the classification. One of the solutions to this problem is using Synthetic Minority Over-sampling Technique (SMOTE). Therefore, this study aims to evaluate the inference and classification quality using the binary logistic regression model without and with SMOTE. The data utilized was the poverty status of households in the rural and urban areas in East Java, Indonesia as contained in the 2019 National Socio-Economic Survey. Furthermore, the variables used are poverty status of the household, the age of the household head (HH), the ratio of household members who are employed, gender of the HH, number of household members, education level of HH, and occupation of the HH. It was concluded that the model with SMOTE approach was better at inference and classifying the results.","PeriodicalId":55877,"journal":{"name":"Statistical Journal of the IAOS","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistical Journal of the IAOS","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3233/sji-220080","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Decision Sciences","Score":null,"Total":0}
引用次数: 0

Abstract

Poverty data in official statistics data is important for development planning. The lower percentage of the poor recorded yearly indicates good development of a country. Moreover, there is always a problem when performing an inferential and classification analysis because of the imbalanced data, thereby leading to biases in the estimation results and prediction errors in the classification. One of the solutions to this problem is using Synthetic Minority Over-sampling Technique (SMOTE). Therefore, this study aims to evaluate the inference and classification quality using the binary logistic regression model without and with SMOTE. The data utilized was the poverty status of households in the rural and urban areas in East Java, Indonesia as contained in the 2019 National Socio-Economic Survey. Furthermore, the variables used are poverty status of the household, the age of the household head (HH), the ratio of household members who are employed, gender of the HH, number of household members, education level of HH, and occupation of the HH. It was concluded that the model with SMOTE approach was better at inference and classifying the results.
用于处理贫困分类中不平衡数据的合成少数民族过采样技术(SMOTE)
官方统计数据中的贫困数据对发展规划很重要。每年记录的贫困人口比例较低,表明一个国家发展良好。此外,由于数据不平衡,在进行推断和分类分析时总是存在问题,从而导致估计结果中的偏差和分类中的预测误差。这个问题的解决方案之一是使用合成少数过采样技术(SMOTE)。因此,本研究旨在使用无SMOTE和有SMOTE的二元逻辑回归模型来评估推理和分类质量。所使用的数据是2019年全国社会经济调查中包含的印度尼西亚东爪哇农村和城市地区家庭的贫困状况。此外,使用的变量包括家庭的贫困状况、户主年龄、家庭成员的就业比例、家庭成员性别、家庭成员数量、家庭教育水平和家庭职业。结果表明,采用SMOTE方法的模型在推理和分类结果方面有较好的效果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Statistical Journal of the IAOS
Statistical Journal of the IAOS Economics, Econometrics and Finance-Economics and Econometrics
CiteScore
1.30
自引率
0.00%
发文量
116
期刊介绍: This is the flagship journal of the International Association for Official Statistics and is expected to be widely circulated and subscribed to by individuals and institutions in all parts of the world. The main aim of the Journal is to support the IAOS mission by publishing articles to promote the understanding and advancement of official statistics and to foster the development of effective and efficient official statistical services on a global basis. Papers are expected to be of wide interest to readers. Such papers may or may not contain strictly original material. All papers are refereed.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信