Assessing the Impact of Temporal Resolution Using BSM1 on the Performance of Machine Learning

Wonki O, S. Ki, J. M. Triolo, Seung Gu Shin
{"title":"Assessing the Impact of Temporal Resolution Using BSM1 on the Performance of Machine Learning","authors":"Wonki O, S. Ki, J. M. Triolo, Seung Gu Shin","doi":"10.4491/ksee.2023.45.12.625","DOIUrl":null,"url":null,"abstract":"Objectives : This study aims to establish efficient strategies for data-driven operational management by examining the variations in machine learning modeling outcomes and data characteristics based on data acquisition intervals and methods.Methods : The BSM1 was used to simulate wastewater treatment facilities and to generate influent and effluent water quality data at 15-minute intervals. The generated data was processed by volume reduction through down sampling and data characteristic observation via resampling techniques, including up sampling through interpolation. Subsequently, the study involved a comparative analysis of the performance of 30 machine learning models built with the down sampled data.Results and Discussion : As data acquisition interval increased (i.e., down sampling progressed), R2 decreased and RMSE increased. When using the mean value as a representation, data accuracy was high, and error loss was minimal. Utilizing the maximum value as a representation helped maintain data characteristics and reduce information loss. Simple interpolation methods did not yield improved data accuracy. Furthermore, with wider data acquisition intervals, the practical predictive performance of machine learning models decreased, and the models experienced a sharp decline in performance when data became insufficient.Conclusion : For models requiring the ability to detect changes rather than accuracy, utilizing the maximum value over a specific period proves to be effective. The measurement interval of data emerges as a significant factor affecting the performance of machine learning models, with models developed under different measurement intervals often failing to demonstrate the expected performance. In this study, we have implemented all stages of data preprocessing, classification, training, and validation using LabVIEW, confirming the potential for integrating data analysis processes into LabVIEW, a widely used platform in the fields of control and measurement.","PeriodicalId":16127,"journal":{"name":"Journal of Korean Society of Environmental Engineers","volume":"26 27","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Korean Society of Environmental Engineers","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4491/ksee.2023.45.12.625","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Objectives : This study aims to establish efficient strategies for data-driven operational management by examining the variations in machine learning modeling outcomes and data characteristics based on data acquisition intervals and methods.Methods : The BSM1 was used to simulate wastewater treatment facilities and to generate influent and effluent water quality data at 15-minute intervals. The generated data was processed by volume reduction through down sampling and data characteristic observation via resampling techniques, including up sampling through interpolation. Subsequently, the study involved a comparative analysis of the performance of 30 machine learning models built with the down sampled data.Results and Discussion : As data acquisition interval increased (i.e., down sampling progressed), R2 decreased and RMSE increased. When using the mean value as a representation, data accuracy was high, and error loss was minimal. Utilizing the maximum value as a representation helped maintain data characteristics and reduce information loss. Simple interpolation methods did not yield improved data accuracy. Furthermore, with wider data acquisition intervals, the practical predictive performance of machine learning models decreased, and the models experienced a sharp decline in performance when data became insufficient.Conclusion : For models requiring the ability to detect changes rather than accuracy, utilizing the maximum value over a specific period proves to be effective. The measurement interval of data emerges as a significant factor affecting the performance of machine learning models, with models developed under different measurement intervals often failing to demonstrate the expected performance. In this study, we have implemented all stages of data preprocessing, classification, training, and validation using LabVIEW, confirming the potential for integrating data analysis processes into LabVIEW, a widely used platform in the fields of control and measurement.
使用 BSM1 评估时间分辨率对机器学习性能的影响
方法:使用 BSM1 模拟污水处理设施,并以 15 分钟的间隔生成进水和出水水质数据。对生成的数据进行处理,通过向下取样减少体积,通过重新取样技术观察数据特征,包括通过插值向上取样。结果与讨论:随着数据采集间隔的增加(即向下采样的进展),R2 降低,RMSE 增加。使用平均值表示时,数据准确性高,误差损失小。使用最大值表示有助于保持数据特征,减少信息损失。简单的插值方法并不能提高数据精度。此外,随着数据采集间隔的扩大,机器学习模型的实际预测性能下降,当数据不足时,模型的性能急剧下降:对于需要检测变化能力而非准确性的模型来说,利用特定时间段内的最大值被证明是有效的。数据的测量时间间隔是影响机器学习模型性能的一个重要因素,在不同测量时间间隔下开发的模型往往无法表现出预期的性能。在本研究中,我们使用 LabVIEW 实现了数据预处理、分类、训练和验证的所有阶段,证实了将数据分析流程集成到 LabVIEW 这一在控制和测量领域广泛使用的平台的潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信