基于机器学习分类器的连续排放监测系统数据模式变化检测——以典型化工园区为例

IF 9.7 1区 环境科学与生态学 Q1 ENVIRONMENTAL SCIENCES
Zhefeng Xu , Xiahong Shi , Wei Shu , Yilu Xin , Xuan Zan , Zhaonian Si , Jinping Cheng
{"title":"基于机器学习分类器的连续排放监测系统数据模式变化检测——以典型化工园区为例","authors":"Zhefeng Xu ,&nbsp;Xiahong Shi ,&nbsp;Wei Shu ,&nbsp;Yilu Xin ,&nbsp;Xuan Zan ,&nbsp;Zhaonian Si ,&nbsp;Jinping Cheng","doi":"10.1016/j.envint.2025.109594","DOIUrl":null,"url":null,"abstract":"<div><div>Continuous Emission Monitoring Systems (CEMS) are critical for real-time pollutant measurement, widely deployed to supervise industrial emissions and ensure regulatory compliance. Despite their utility, CEMS data face challenges of data fabrications, complicating efforts to detect environmental violations, which may be detected according to emission pattern changes. This study explores the application of machine learning classifiers to analyse Continuous Emission Monitoring Systems data from 107 waste discharge outlets across 31 corporations in a Chinese chemical industrial park. By categorizing outlets into 12 datasets based on monitoring parameters, 17 machine learning models were evaluated to identify emission patterns and detect potential data anomalies. Random Forest classifiers (RFC) consistently demonstrated high accuracy (up to 100% in specific datasets), outperforming other models, while gradient boost-based methods also excelled. Temporal emission pattern analysis revealed significant changes in 334 instances (90% confidence) across collection weeks, though only 24 aligned with regulatory offsite supervision records, highlighting discrepancies between algorithmic detection and traditional compliance checks. Vector distances and cosine similarities of mean/median emission values correlated with misprediction probabilities, yet fewer than 60% of pattern changes coincided with extremum values in these metrics. The study underscores the efficacy of RFCs in distinguishing outlet-specific emission profiles and proposes a supplemental approach to uncover subtle data manipulation or operational shifts. However, challenges persist in linking algorithmic findings to documented violations, emphasizing the need for integrated data frameworks to enhance environmental oversight. This work advances machine learning classifier’s role in emission monitoring, offering a pathway for CEMS management and regulatory strategy refinement.</div></div>","PeriodicalId":308,"journal":{"name":"Environment International","volume":"201 ","pages":"Article 109594"},"PeriodicalIF":9.7000,"publicationDate":"2025-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Machine learning classifiers to detect data pattern change of continuous emission monitoring system: A typical chemical industrial park as an example\",\"authors\":\"Zhefeng Xu ,&nbsp;Xiahong Shi ,&nbsp;Wei Shu ,&nbsp;Yilu Xin ,&nbsp;Xuan Zan ,&nbsp;Zhaonian Si ,&nbsp;Jinping Cheng\",\"doi\":\"10.1016/j.envint.2025.109594\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Continuous Emission Monitoring Systems (CEMS) are critical for real-time pollutant measurement, widely deployed to supervise industrial emissions and ensure regulatory compliance. Despite their utility, CEMS data face challenges of data fabrications, complicating efforts to detect environmental violations, which may be detected according to emission pattern changes. This study explores the application of machine learning classifiers to analyse Continuous Emission Monitoring Systems data from 107 waste discharge outlets across 31 corporations in a Chinese chemical industrial park. By categorizing outlets into 12 datasets based on monitoring parameters, 17 machine learning models were evaluated to identify emission patterns and detect potential data anomalies. Random Forest classifiers (RFC) consistently demonstrated high accuracy (up to 100% in specific datasets), outperforming other models, while gradient boost-based methods also excelled. Temporal emission pattern analysis revealed significant changes in 334 instances (90% confidence) across collection weeks, though only 24 aligned with regulatory offsite supervision records, highlighting discrepancies between algorithmic detection and traditional compliance checks. Vector distances and cosine similarities of mean/median emission values correlated with misprediction probabilities, yet fewer than 60% of pattern changes coincided with extremum values in these metrics. The study underscores the efficacy of RFCs in distinguishing outlet-specific emission profiles and proposes a supplemental approach to uncover subtle data manipulation or operational shifts. However, challenges persist in linking algorithmic findings to documented violations, emphasizing the need for integrated data frameworks to enhance environmental oversight. This work advances machine learning classifier’s role in emission monitoring, offering a pathway for CEMS management and regulatory strategy refinement.</div></div>\",\"PeriodicalId\":308,\"journal\":{\"name\":\"Environment International\",\"volume\":\"201 \",\"pages\":\"Article 109594\"},\"PeriodicalIF\":9.7000,\"publicationDate\":\"2025-06-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Environment International\",\"FirstCategoryId\":\"93\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0160412025003459\",\"RegionNum\":1,\"RegionCategory\":\"环境科学与生态学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENVIRONMENTAL SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Environment International","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0160412025003459","RegionNum":1,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0

摘要

连续排放监测系统(CEMS)是实时污染物测量的关键,被广泛应用于监督工业排放和确保法规遵守。尽管具有实用性,但CEMS数据面临着数据伪造的挑战,这使得检测环境违规行为的工作变得复杂,这些违规行为可能会根据排放模式的变化进行检测。本研究探讨了机器学习分类器的应用,以分析中国某化工园区31家企业107个废物排放口的连续排放监测系统数据。通过根据监测参数将出口分类为12个数据集,对17个机器学习模型进行评估,以识别排放模式并检测潜在的数据异常。随机森林分类器(RFC)一直表现出很高的准确率(在特定数据集中高达100%),优于其他模型,而基于梯度增强的方法也表现出色。时间排放模式分析显示,在收集周内,334个实例(90%置信度)发生了显著变化,尽管只有24个实例符合监管部门的场外监督记录,突出了算法检测与传统合规检查之间的差异。向量距离和平均/中位数发射值的余弦相似度与错误预测概率相关,但不到60%的模式变化与这些指标中的极值一致。该研究强调了rfc在区分出口特定排放概况方面的功效,并提出了一种补充方法来发现微妙的数据操纵或操作变化。然而,在将算法发现与记录在案的违规行为联系起来方面仍然存在挑战,强调需要建立综合数据框架来加强环境监督。这项工作推进了机器学习分类器在排放监测中的作用,为CEMS管理和监管策略改进提供了途径。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Machine learning classifiers to detect data pattern change of continuous emission monitoring system: A typical chemical industrial park as an example

Machine learning classifiers to detect data pattern change of continuous emission monitoring system: A typical chemical industrial park as an example

Machine learning classifiers to detect data pattern change of continuous emission monitoring system: A typical chemical industrial park as an example
Continuous Emission Monitoring Systems (CEMS) are critical for real-time pollutant measurement, widely deployed to supervise industrial emissions and ensure regulatory compliance. Despite their utility, CEMS data face challenges of data fabrications, complicating efforts to detect environmental violations, which may be detected according to emission pattern changes. This study explores the application of machine learning classifiers to analyse Continuous Emission Monitoring Systems data from 107 waste discharge outlets across 31 corporations in a Chinese chemical industrial park. By categorizing outlets into 12 datasets based on monitoring parameters, 17 machine learning models were evaluated to identify emission patterns and detect potential data anomalies. Random Forest classifiers (RFC) consistently demonstrated high accuracy (up to 100% in specific datasets), outperforming other models, while gradient boost-based methods also excelled. Temporal emission pattern analysis revealed significant changes in 334 instances (90% confidence) across collection weeks, though only 24 aligned with regulatory offsite supervision records, highlighting discrepancies between algorithmic detection and traditional compliance checks. Vector distances and cosine similarities of mean/median emission values correlated with misprediction probabilities, yet fewer than 60% of pattern changes coincided with extremum values in these metrics. The study underscores the efficacy of RFCs in distinguishing outlet-specific emission profiles and proposes a supplemental approach to uncover subtle data manipulation or operational shifts. However, challenges persist in linking algorithmic findings to documented violations, emphasizing the need for integrated data frameworks to enhance environmental oversight. This work advances machine learning classifier’s role in emission monitoring, offering a pathway for CEMS management and regulatory strategy refinement.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Environment International
Environment International 环境科学-环境科学
CiteScore
21.90
自引率
3.40%
发文量
734
审稿时长
2.8 months
期刊介绍: Environmental Health publishes manuscripts focusing on critical aspects of environmental and occupational medicine, including studies in toxicology and epidemiology, to illuminate the human health implications of exposure to environmental hazards. The journal adopts an open-access model and practices open peer review. It caters to scientists and practitioners across all environmental science domains, directly or indirectly impacting human health and well-being. With a commitment to enhancing the prevention of environmentally-related health risks, Environmental Health serves as a public health journal for the community and scientists engaged in matters of public health significance concerning the environment.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信