Multi-class random forest model to classify wastewater treatment imbalanced data

IF 6.2 2区 经济学 Q1 ECONOMICS
{"title":"Multi-class random forest model to classify wastewater treatment imbalanced data","authors":"","doi":"10.1016/j.seps.2024.102021","DOIUrl":null,"url":null,"abstract":"<div><p>The odor emissions generated by treatment plants imply complex environmental and economic issues. The modern instrumental odor monitoring systems, based on an array of several sensors, continuously record the gaseous compounds. However they are characterized by poor selectivity, compromising the possibility to discriminate and identify the emission sources. In this paper, the ability of odor sensors to distinguish between the treatment plant sections generating the gaseous compounds is evaluated on the basis of the random forest classifier, and is also compared to the discriminant analysis performance. Taking into account that a multi-parametric system of sensors can be affected by the presence of a small sample size with imbalanced classes, several strategies for data balancing are proposed and analyzed. The findings show that the random forest classifier is characterized by a better capacity to distinguish the emissions sources with respect to the classical multiple discriminant analysis, in terms of all evaluation metrics. This is also confirmed for different resampling techniques, especially in the over-sampling case. The data concerning measurements from 10 sensors of multi-parametric systems of odor monitoring collected from a company specialized in environmental assistance are considered for this analysis.</p></div>","PeriodicalId":22033,"journal":{"name":"Socio-economic Planning Sciences","volume":null,"pages":null},"PeriodicalIF":6.2000,"publicationDate":"2024-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0038012124002209/pdfft?md5=ba8e1184f47c2ae26d0fb1d843243021&pid=1-s2.0-S0038012124002209-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Socio-economic Planning Sciences","FirstCategoryId":"96","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0038012124002209","RegionNum":2,"RegionCategory":"经济学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ECONOMICS","Score":null,"Total":0}
引用次数: 0

Abstract

The odor emissions generated by treatment plants imply complex environmental and economic issues. The modern instrumental odor monitoring systems, based on an array of several sensors, continuously record the gaseous compounds. However they are characterized by poor selectivity, compromising the possibility to discriminate and identify the emission sources. In this paper, the ability of odor sensors to distinguish between the treatment plant sections generating the gaseous compounds is evaluated on the basis of the random forest classifier, and is also compared to the discriminant analysis performance. Taking into account that a multi-parametric system of sensors can be affected by the presence of a small sample size with imbalanced classes, several strategies for data balancing are proposed and analyzed. The findings show that the random forest classifier is characterized by a better capacity to distinguish the emissions sources with respect to the classical multiple discriminant analysis, in terms of all evaluation metrics. This is also confirmed for different resampling techniques, especially in the over-sampling case. The data concerning measurements from 10 sensors of multi-parametric systems of odor monitoring collected from a company specialized in environmental assistance are considered for this analysis.

对污水处理不平衡数据进行分类的多类随机森林模型
污水处理厂产生的臭气排放会带来复杂的环境和经济问题。现代仪器气味监测系统以多个传感器阵列为基础,可持续记录气态化合物。然而,它们的特点是选择性差,影响了区分和识别排放源的可能性。本文在随机森林分类器的基础上,对气味传感器区分产生气体化合物的处理厂部分的能力进行了评估,并与判别分析性能进行了比较。考虑到多参数传感器系统可能会受到小样本量和不平衡类别的影响,提出并分析了几种数据平衡策略。研究结果表明,与经典的多重判别分析相比,随机森林分类器在所有评价指标方面都具有更好的排放源判别能力。不同的重采样技术也证实了这一点,尤其是在过度采样的情况下。本分析考虑了从一家专门从事环境援助的公司收集的 10 个多参数气味监测系统传感器的测量数据。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Socio-economic Planning Sciences
Socio-economic Planning Sciences OPERATIONS RESEARCH & MANAGEMENT SCIENCE-
CiteScore
9.40
自引率
13.10%
发文量
294
审稿时长
58 days
期刊介绍: Studies directed toward the more effective utilization of existing resources, e.g. mathematical programming models of health care delivery systems with relevance to more effective program design; systems analysis of fire outbreaks and its relevance to the location of fire stations; statistical analysis of the efficiency of a developing country economy or industry. Studies relating to the interaction of various segments of society and technology, e.g. the effects of government health policies on the utilization and design of hospital facilities; the relationship between housing density and the demands on public transportation or other service facilities: patterns and implications of urban development and air or water pollution. Studies devoted to the anticipations of and response to future needs for social, health and other human services, e.g. the relationship between industrial growth and the development of educational resources in affected areas; investigation of future demands for material and child health resources in a developing country; design of effective recycling in an urban setting.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信