Data augmentation using conditional generative adversarial network (cGAN): applications for sewer condition classification and testing using different machine learning techniques

IF 2.2 3区 工程技术 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
Haile Woldesellasse, Solomon Tesfamariam
{"title":"Data augmentation using conditional generative adversarial network (cGAN): applications for sewer condition classification and testing using different machine learning techniques","authors":"Haile Woldesellasse, Solomon Tesfamariam","doi":"10.2166/hydro.2024.135","DOIUrl":null,"url":null,"abstract":"\n The increasing availability of condition assessment data highlights the challenge of managing data imbalance in the asset management of aging infrastructure. Aging sewer pipes pose significant threats to health and the environment, underscoring the importance of proactive management practices to enhance asset maintenance and mitigate associated risks. While machine learning (ML) models are widely employed to model the complex deterioration process of sewer pipes, they face performance limitations when trained on imbalanced condition grade data. This paper addresses this issue by proposing a novel approach using conditional generative adversarial network (cGAN) for data augmentation. By generating synthetic data for minority classes, the skewed distribution of the sewer dataset is balanced, facilitating more robust and accurate predictive models. The utility of the proposed method is evaluated by training different ML classifiers, including neural network (NN), decision tree, quadratic discriminant analysis, Naïve Bayes, support vector machine (SVM), and K-nearest neighbor. Quadratic discriminant, Naïve Bayes, NN, and SVM classifiers demonstrated improvement. The cGAN-based data augmentation method also outperformed two other data imbalance handling techniques, random under-sampling, and cost-sensitive NN. Consequently, data generated by cGAN can effectively aid asset management by developing proactive classifiers that accurately predict pipes at a high risk of failure.","PeriodicalId":54801,"journal":{"name":"Journal of Hydroinformatics","volume":null,"pages":null},"PeriodicalIF":2.2000,"publicationDate":"2024-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Hydroinformatics","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.2166/hydro.2024.135","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

The increasing availability of condition assessment data highlights the challenge of managing data imbalance in the asset management of aging infrastructure. Aging sewer pipes pose significant threats to health and the environment, underscoring the importance of proactive management practices to enhance asset maintenance and mitigate associated risks. While machine learning (ML) models are widely employed to model the complex deterioration process of sewer pipes, they face performance limitations when trained on imbalanced condition grade data. This paper addresses this issue by proposing a novel approach using conditional generative adversarial network (cGAN) for data augmentation. By generating synthetic data for minority classes, the skewed distribution of the sewer dataset is balanced, facilitating more robust and accurate predictive models. The utility of the proposed method is evaluated by training different ML classifiers, including neural network (NN), decision tree, quadratic discriminant analysis, Naïve Bayes, support vector machine (SVM), and K-nearest neighbor. Quadratic discriminant, Naïve Bayes, NN, and SVM classifiers demonstrated improvement. The cGAN-based data augmentation method also outperformed two other data imbalance handling techniques, random under-sampling, and cost-sensitive NN. Consequently, data generated by cGAN can effectively aid asset management by developing proactive classifiers that accurately predict pipes at a high risk of failure.
使用条件生成式对抗网络(cGAN)进行数据扩增:使用不同机器学习技术进行下水道状况分类和测试的应用
状况评估数据的可用性越来越高,这凸显了在老化基础设施的资产管理中管理数据失衡所面临的挑战。老化的下水管道对健康和环境构成了严重威胁,这凸显了积极主动的管理措施对加强资产维护和降低相关风险的重要性。虽然机器学习(ML)模型被广泛用于对下水管道复杂的老化过程进行建模,但在对不平衡的状态等级数据进行训练时,这些模型的性能会受到限制。本文针对这一问题,提出了一种使用条件生成对抗网络(cGAN)进行数据增强的新方法。通过生成少数等级的合成数据,下水道数据集的倾斜分布得到了平衡,从而有助于建立更稳健、更准确的预测模型。通过训练不同的 ML 分类器,包括神经网络 (NN)、决策树、二次判别分析、奈夫贝叶斯、支持向量机 (SVM) 和 K-nearest neighbor,对所提出方法的实用性进行了评估。四元判别分析、奈夫贝叶斯、神经网络和 SVM 分类器的效果都有所改善。基于 cGAN 的数据增强方法还优于其他两种数据不平衡处理技术,即随机欠采样和成本敏感 NN。因此,cGAN 生成的数据可以通过开发前瞻性分类器来准确预测故障风险较高的管道,从而有效地帮助资产管理。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Hydroinformatics
Journal of Hydroinformatics 工程技术-工程:土木
CiteScore
4.80
自引率
3.70%
发文量
59
审稿时长
3 months
期刊介绍: Journal of Hydroinformatics is a peer-reviewed journal devoted to the application of information technology in the widest sense to problems of the aquatic environment. It promotes Hydroinformatics as a cross-disciplinary field of study, combining technological, human-sociological and more general environmental interests, including an ethical perspective.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信