Alemayehu A. Ambel, Robert Bain, Tefera Bekele Degefu, Ayca Donmez, Richard Johnston, Tom Slaymaker
{"title":"Addressing gaps in data on drinking water quality through data integration and machine learning: evidence from Ethiopia","authors":"Alemayehu A. Ambel, Robert Bain, Tefera Bekele Degefu, Ayca Donmez, Richard Johnston, Tom Slaymaker","doi":"10.1038/s41545-023-00272-8","DOIUrl":null,"url":null,"abstract":"Monitoring access to safely managed drinking water services requires information on water quality. An increasing number of countries have integrated water quality testing in household surveys however it is not anticipated that such tests will be included in all future surveys. Using water testing data from the 2016 Ethiopia Socio-Economic Survey (ESS) we developed predictive models to identify households using contaminated (≥1 E. coli per 100 mL) drinking water sources based on common machine learning classification algorithms. These models were then applied to the 2013–2014 and 2018–2019 waves of the ESS that did not include water testing. The highest performing model achieved good accuracy (88.5%; 95% CI 86.3%, 90.6%) and discrimination (AUC 0.91; 95% CI 0.89, 0.94). The use of demographic, socioeconomic, and geospatial variables provided comparable results to that of the full features model whereas a model based exclusively on water source type performed poorly. Drinking water quality at the point of collection can be predicted from demographic, socioeconomic, and geospatial variables that are often available in household surveys.","PeriodicalId":19375,"journal":{"name":"npj Clean Water","volume":" ","pages":"1-9"},"PeriodicalIF":10.4000,"publicationDate":"2023-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s41545-023-00272-8.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"npj Clean Water","FirstCategoryId":"5","ListUrlMain":"https://www.nature.com/articles/s41545-023-00272-8","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, CHEMICAL","Score":null,"Total":0}
引用次数: 0
Abstract
Monitoring access to safely managed drinking water services requires information on water quality. An increasing number of countries have integrated water quality testing in household surveys however it is not anticipated that such tests will be included in all future surveys. Using water testing data from the 2016 Ethiopia Socio-Economic Survey (ESS) we developed predictive models to identify households using contaminated (≥1 E. coli per 100 mL) drinking water sources based on common machine learning classification algorithms. These models were then applied to the 2013–2014 and 2018–2019 waves of the ESS that did not include water testing. The highest performing model achieved good accuracy (88.5%; 95% CI 86.3%, 90.6%) and discrimination (AUC 0.91; 95% CI 0.89, 0.94). The use of demographic, socioeconomic, and geospatial variables provided comparable results to that of the full features model whereas a model based exclusively on water source type performed poorly. Drinking water quality at the point of collection can be predicted from demographic, socioeconomic, and geospatial variables that are often available in household surveys.
npj Clean WaterEnvironmental Science-Water Science and Technology
CiteScore
15.30
自引率
2.60%
发文量
61
审稿时长
5 weeks
期刊介绍:
npj Clean Water publishes high-quality papers that report cutting-edge science, technology, applications, policies, and societal issues contributing to a more sustainable supply of clean water. The journal's publications may also support and accelerate the achievement of Sustainable Development Goal 6, which focuses on clean water and sanitation.