Zhefeng Xu , Xiahong Shi , Wei Shu , Yilu Xin , Xuan Zan , Zhaonian Si , Jinping Cheng
{"title":"基于机器学习分类器的连续排放监测系统数据模式变化检测——以典型化工园区为例","authors":"Zhefeng Xu , Xiahong Shi , Wei Shu , Yilu Xin , Xuan Zan , Zhaonian Si , Jinping Cheng","doi":"10.1016/j.envint.2025.109594","DOIUrl":null,"url":null,"abstract":"<div><div>Continuous Emission Monitoring Systems (CEMS) are critical for real-time pollutant measurement, widely deployed to supervise industrial emissions and ensure regulatory compliance. Despite their utility, CEMS data face challenges of data fabrications, complicating efforts to detect environmental violations, which may be detected according to emission pattern changes. This study explores the application of machine learning classifiers to analyse Continuous Emission Monitoring Systems data from 107 waste discharge outlets across 31 corporations in a Chinese chemical industrial park. By categorizing outlets into 12 datasets based on monitoring parameters, 17 machine learning models were evaluated to identify emission patterns and detect potential data anomalies. Random Forest classifiers (RFC) consistently demonstrated high accuracy (up to 100% in specific datasets), outperforming other models, while gradient boost-based methods also excelled. Temporal emission pattern analysis revealed significant changes in 334 instances (90% confidence) across collection weeks, though only 24 aligned with regulatory offsite supervision records, highlighting discrepancies between algorithmic detection and traditional compliance checks. Vector distances and cosine similarities of mean/median emission values correlated with misprediction probabilities, yet fewer than 60% of pattern changes coincided with extremum values in these metrics. The study underscores the efficacy of RFCs in distinguishing outlet-specific emission profiles and proposes a supplemental approach to uncover subtle data manipulation or operational shifts. However, challenges persist in linking algorithmic findings to documented violations, emphasizing the need for integrated data frameworks to enhance environmental oversight. This work advances machine learning classifier’s role in emission monitoring, offering a pathway for CEMS management and regulatory strategy refinement.</div></div>","PeriodicalId":308,"journal":{"name":"Environment International","volume":"201 ","pages":"Article 109594"},"PeriodicalIF":9.7000,"publicationDate":"2025-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Machine learning classifiers to detect data pattern change of continuous emission monitoring system: A typical chemical industrial park as an example\",\"authors\":\"Zhefeng Xu , Xiahong Shi , Wei Shu , Yilu Xin , Xuan Zan , Zhaonian Si , Jinping Cheng\",\"doi\":\"10.1016/j.envint.2025.109594\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Continuous Emission Monitoring Systems (CEMS) are critical for real-time pollutant measurement, widely deployed to supervise industrial emissions and ensure regulatory compliance. Despite their utility, CEMS data face challenges of data fabrications, complicating efforts to detect environmental violations, which may be detected according to emission pattern changes. This study explores the application of machine learning classifiers to analyse Continuous Emission Monitoring Systems data from 107 waste discharge outlets across 31 corporations in a Chinese chemical industrial park. By categorizing outlets into 12 datasets based on monitoring parameters, 17 machine learning models were evaluated to identify emission patterns and detect potential data anomalies. Random Forest classifiers (RFC) consistently demonstrated high accuracy (up to 100% in specific datasets), outperforming other models, while gradient boost-based methods also excelled. Temporal emission pattern analysis revealed significant changes in 334 instances (90% confidence) across collection weeks, though only 24 aligned with regulatory offsite supervision records, highlighting discrepancies between algorithmic detection and traditional compliance checks. Vector distances and cosine similarities of mean/median emission values correlated with misprediction probabilities, yet fewer than 60% of pattern changes coincided with extremum values in these metrics. The study underscores the efficacy of RFCs in distinguishing outlet-specific emission profiles and proposes a supplemental approach to uncover subtle data manipulation or operational shifts. However, challenges persist in linking algorithmic findings to documented violations, emphasizing the need for integrated data frameworks to enhance environmental oversight. This work advances machine learning classifier’s role in emission monitoring, offering a pathway for CEMS management and regulatory strategy refinement.</div></div>\",\"PeriodicalId\":308,\"journal\":{\"name\":\"Environment International\",\"volume\":\"201 \",\"pages\":\"Article 109594\"},\"PeriodicalIF\":9.7000,\"publicationDate\":\"2025-06-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Environment International\",\"FirstCategoryId\":\"93\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0160412025003459\",\"RegionNum\":1,\"RegionCategory\":\"环境科学与生态学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENVIRONMENTAL SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Environment International","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0160412025003459","RegionNum":1,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
Machine learning classifiers to detect data pattern change of continuous emission monitoring system: A typical chemical industrial park as an example
Continuous Emission Monitoring Systems (CEMS) are critical for real-time pollutant measurement, widely deployed to supervise industrial emissions and ensure regulatory compliance. Despite their utility, CEMS data face challenges of data fabrications, complicating efforts to detect environmental violations, which may be detected according to emission pattern changes. This study explores the application of machine learning classifiers to analyse Continuous Emission Monitoring Systems data from 107 waste discharge outlets across 31 corporations in a Chinese chemical industrial park. By categorizing outlets into 12 datasets based on monitoring parameters, 17 machine learning models were evaluated to identify emission patterns and detect potential data anomalies. Random Forest classifiers (RFC) consistently demonstrated high accuracy (up to 100% in specific datasets), outperforming other models, while gradient boost-based methods also excelled. Temporal emission pattern analysis revealed significant changes in 334 instances (90% confidence) across collection weeks, though only 24 aligned with regulatory offsite supervision records, highlighting discrepancies between algorithmic detection and traditional compliance checks. Vector distances and cosine similarities of mean/median emission values correlated with misprediction probabilities, yet fewer than 60% of pattern changes coincided with extremum values in these metrics. The study underscores the efficacy of RFCs in distinguishing outlet-specific emission profiles and proposes a supplemental approach to uncover subtle data manipulation or operational shifts. However, challenges persist in linking algorithmic findings to documented violations, emphasizing the need for integrated data frameworks to enhance environmental oversight. This work advances machine learning classifier’s role in emission monitoring, offering a pathway for CEMS management and regulatory strategy refinement.
期刊介绍:
Environmental Health publishes manuscripts focusing on critical aspects of environmental and occupational medicine, including studies in toxicology and epidemiology, to illuminate the human health implications of exposure to environmental hazards. The journal adopts an open-access model and practices open peer review.
It caters to scientists and practitioners across all environmental science domains, directly or indirectly impacting human health and well-being. With a commitment to enhancing the prevention of environmentally-related health risks, Environmental Health serves as a public health journal for the community and scientists engaged in matters of public health significance concerning the environment.