Damien Echevin , Guy Fotso , Yacine Bouroubi , Harold Coulombe , Qing Li
{"title":"Combining survey and census data for improved poverty prediction using semi-supervised deep learning","authors":"Damien Echevin , Guy Fotso , Yacine Bouroubi , Harold Coulombe , Qing Li","doi":"10.1016/j.jdeveco.2024.103385","DOIUrl":null,"url":null,"abstract":"<div><div>This paper presents a methodology for predicting poverty using semi-supervised learning techniques, specifically pseudo-labeling, and deep learning algorithms. Standard poverty prediction models rely on limited household survey data, whereas our approach exploits large amounts of unlabeled census data to improve prediction accuracy. By applying pseudo-labeling, we improve key performance metrics across various African regions, where our models outperform conventional approaches to identifying poor individuals. Deep neural networks (DNNs) trained on pseudo-labeled data exhibited area under the curve (AUC) scores ranging from 0.8 to over 0.9, a notable improvement over previous machine learning survey-based methods. Furthermore, random undersampling was key to refining model performance, balancing higher coverage with some reduction in precision. These findings have significant implications for poverty targeting, enabling more accurate identification of poor individuals and supporting better resource allocation.</div></div>","PeriodicalId":48418,"journal":{"name":"Journal of Development Economics","volume":"172 ","pages":"Article 103385"},"PeriodicalIF":5.1000,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Development Economics","FirstCategoryId":"96","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0304387824001342","RegionNum":1,"RegionCategory":"经济学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ECONOMICS","Score":null,"Total":0}
引用次数: 0
Abstract
This paper presents a methodology for predicting poverty using semi-supervised learning techniques, specifically pseudo-labeling, and deep learning algorithms. Standard poverty prediction models rely on limited household survey data, whereas our approach exploits large amounts of unlabeled census data to improve prediction accuracy. By applying pseudo-labeling, we improve key performance metrics across various African regions, where our models outperform conventional approaches to identifying poor individuals. Deep neural networks (DNNs) trained on pseudo-labeled data exhibited area under the curve (AUC) scores ranging from 0.8 to over 0.9, a notable improvement over previous machine learning survey-based methods. Furthermore, random undersampling was key to refining model performance, balancing higher coverage with some reduction in precision. These findings have significant implications for poverty targeting, enabling more accurate identification of poor individuals and supporting better resource allocation.
期刊介绍:
The Journal of Development Economics publishes papers relating to all aspects of economic development - from immediate policy concerns to structural problems of underdevelopment. The emphasis is on quantitative or analytical work, which is relevant as well as intellectually stimulating.