Addressing gaps in data on drinking water quality through data integration and machine learning: evidence from Ethiopia

IF 10.4 1区 工程技术 Q1 ENGINEERING, CHEMICAL
Alemayehu A. Ambel, Robert Bain, Tefera Bekele Degefu, Ayca Donmez, Richard Johnston, Tom Slaymaker
{"title":"Addressing gaps in data on drinking water quality through data integration and machine learning: evidence from Ethiopia","authors":"Alemayehu A. Ambel, Robert Bain, Tefera Bekele Degefu, Ayca Donmez, Richard Johnston, Tom Slaymaker","doi":"10.1038/s41545-023-00272-8","DOIUrl":null,"url":null,"abstract":"Monitoring access to safely managed drinking water services requires information on water quality. An increasing number of countries have integrated water quality testing in household surveys however it is not anticipated that such tests will be included in all future surveys. Using water testing data from the 2016 Ethiopia Socio-Economic Survey (ESS) we developed predictive models to identify households using contaminated (≥1 E. coli per 100 mL) drinking water sources based on common machine learning classification algorithms. These models were then applied to the 2013–2014 and 2018–2019 waves of the ESS that did not include water testing. The highest performing model achieved good accuracy (88.5%; 95% CI 86.3%, 90.6%) and discrimination (AUC 0.91; 95% CI 0.89, 0.94). The use of demographic, socioeconomic, and geospatial variables provided comparable results to that of the full features model whereas a model based exclusively on water source type performed poorly. Drinking water quality at the point of collection can be predicted from demographic, socioeconomic, and geospatial variables that are often available in household surveys.","PeriodicalId":19375,"journal":{"name":"npj Clean Water","volume":" ","pages":"1-9"},"PeriodicalIF":10.4000,"publicationDate":"2023-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s41545-023-00272-8.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"npj Clean Water","FirstCategoryId":"5","ListUrlMain":"https://www.nature.com/articles/s41545-023-00272-8","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, CHEMICAL","Score":null,"Total":0}
引用次数: 0

Abstract

Monitoring access to safely managed drinking water services requires information on water quality. An increasing number of countries have integrated water quality testing in household surveys however it is not anticipated that such tests will be included in all future surveys. Using water testing data from the 2016 Ethiopia Socio-Economic Survey (ESS) we developed predictive models to identify households using contaminated (≥1 E. coli per 100 mL) drinking water sources based on common machine learning classification algorithms. These models were then applied to the 2013–2014 and 2018–2019 waves of the ESS that did not include water testing. The highest performing model achieved good accuracy (88.5%; 95% CI 86.3%, 90.6%) and discrimination (AUC 0.91; 95% CI 0.89, 0.94). The use of demographic, socioeconomic, and geospatial variables provided comparable results to that of the full features model whereas a model based exclusively on water source type performed poorly. Drinking water quality at the point of collection can be predicted from demographic, socioeconomic, and geospatial variables that are often available in household surveys.

Abstract Image

Abstract Image

通过数据整合和机器学习解决饮用水质量数据差距:来自埃塞俄比亚的证据
监测安全管理饮用水服务的获取情况需要水质信息。越来越多的国家已将水质检测纳入住户调查,但预计此类检测不会被纳入所有未来的调查中。利用 2016 年埃塞俄比亚社会经济调查(ESS)中的水质检测数据,我们基于常见的机器学习分类算法开发了预测模型,用于识别使用受污染(每 100 毫升中≥1 个大肠杆菌)饮用水源的家庭。然后将这些模型应用于不包括水质检测的 2013-2014 年和 2018-2019 年 ESS 波。性能最高的模型实现了良好的准确性(88.5%;95% CI 86.3%,90.6%)和区分度(AUC 0.91;95% CI 0.89,0.94)。使用人口、社会经济和地理空间变量得出的结果与全特征模型相当,而仅基于水源类型的模型则表现不佳。通过人口、社会经济和地理空间变量可以预测取水点的饮用水质量,这些变量通常可以在家庭调查中获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
npj Clean Water
npj Clean Water Environmental Science-Water Science and Technology
CiteScore
15.30
自引率
2.60%
发文量
61
审稿时长
5 weeks
期刊介绍: npj Clean Water publishes high-quality papers that report cutting-edge science, technology, applications, policies, and societal issues contributing to a more sustainable supply of clean water. The journal's publications may also support and accelerate the achievement of Sustainable Development Goal 6, which focuses on clean water and sanitation.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信