Infilling missing data and outliers for a conventional sewage treatment plant using a self-organizing map: a case study of Kauma Sewage Treatment Plant in Lilongwe, Malawi

IF 1.5 Q4 WATER RESOURCES
Madalitso Mng’ombe, B. Chunga, Eddie W. Mtonga, R. Chidya, M. Malota
{"title":"Infilling missing data and outliers for a conventional sewage treatment plant using a self-organizing map: a case study of Kauma Sewage Treatment Plant in Lilongwe, Malawi","authors":"Madalitso Mng’ombe, B. Chunga, Eddie W. Mtonga, R. Chidya, M. Malota","doi":"10.2166/h2oj.2023.013","DOIUrl":null,"url":null,"abstract":"\n \n Data availability is key for modeling of wastewater treatment processes. However, process data are characterized by missing values and outliers. This study applied a self-organizing map (SOM), to fill in missing values and replace outliers in wastewater treatment data from Kauma Sewage Treatment Plant in Lilongwe, Malawi. We used primary and secondary wastewater data and executed the SOM algorithm to fill missing values and replace outliers in effluent pH, biochemical oxygen demand, and dissolved oxygen. The results suggest that SOM algorithm is reliable in filling gaps in wastewater time series data with less than 50% missing values with correlation coefficient (R) values of >0.90. The SOM algorithm failed to reliably fill gaps and replace outliers in time series data with >50% missing values. For instance, high mean square error (MSE) values of 3,655.57, 10.62, and 2,153.34 for pH, DO, and BOD, respectively, were registered in datasets with more than 50% missing values, while very small MSE values (MSE ≈ 0) were associated with effluent pH, BOD, and DO data with missing values of >50%. Practitioners can use this approach to improve the planning and management of wastewater treatment facilities where available data records are riddled with missing observations.","PeriodicalId":36060,"journal":{"name":"H2Open Journal","volume":" ","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2023-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"H2Open Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2166/h2oj.2023.013","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"WATER RESOURCES","Score":null,"Total":0}
引用次数: 0

Abstract

Data availability is key for modeling of wastewater treatment processes. However, process data are characterized by missing values and outliers. This study applied a self-organizing map (SOM), to fill in missing values and replace outliers in wastewater treatment data from Kauma Sewage Treatment Plant in Lilongwe, Malawi. We used primary and secondary wastewater data and executed the SOM algorithm to fill missing values and replace outliers in effluent pH, biochemical oxygen demand, and dissolved oxygen. The results suggest that SOM algorithm is reliable in filling gaps in wastewater time series data with less than 50% missing values with correlation coefficient (R) values of >0.90. The SOM algorithm failed to reliably fill gaps and replace outliers in time series data with >50% missing values. For instance, high mean square error (MSE) values of 3,655.57, 10.62, and 2,153.34 for pH, DO, and BOD, respectively, were registered in datasets with more than 50% missing values, while very small MSE values (MSE ≈ 0) were associated with effluent pH, BOD, and DO data with missing values of >50%. Practitioners can use this approach to improve the planning and management of wastewater treatment facilities where available data records are riddled with missing observations.
使用自组织地图填充传统污水处理厂的缺失数据和异常值:马拉维利隆圭Kauma污水处理厂的案例研究
数据可用性是污水处理过程建模的关键。然而,过程数据的特点是缺失值和异常值。本研究应用自组织图(SOM)来填补缺失值并替换马拉维利隆圭Kauma污水处理厂废水处理数据中的异常值。我们使用一次和二次废水数据,并执行SOM算法来填补缺失值,并替换出水pH值、生化需氧量和溶解氧的异常值。结果表明,SOM算法在相关系数(R)为>0.90的缺失值小于50%的废水时间序列数据中,可以可靠地填补空白。SOM算法无法可靠地填补空白和替换缺失值为50%的时间序列数据中的异常值。例如,pH、DO和BOD的高均方误差(MSE)值分别为3,655.57、10.62和2,153.34,在缺失值超过50%的数据集中注册,而非常小的MSE值(MSE≈0)与流出pH、BOD和DO数据相关,缺失值为bbb50 %。从业者可以使用这种方法来改进废水处理设施的规划和管理,其中可用的数据记录充斥着缺失的观察结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
H2Open Journal
H2Open Journal Environmental Science-Environmental Science (miscellaneous)
CiteScore
3.30
自引率
4.80%
发文量
47
审稿时长
24 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信