Feasibility of classification of drainage and river water quality using machine learning methods based on multidimensional data from a gas sensor array.
{"title":"Feasibility of classification of drainage and river water quality using machine learning methods based on multidimensional data from a gas sensor array.","authors":"Magdalena Piłat-Rożek, Grzegorz Łagód","doi":"10.26444/aaem/196101","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>The aim of the study is to verify whether the electronic nose system - an array of 17 gas sensors with a signal analysis system - is a useful tool for the classification and preliminary assessment of the quality of drainage water.</p><p><strong>Material and methods: </strong>Water samples for analysis were collected in the Park Ludowy (People's Park), located next to the Bystrzyca River, near the city center of Lublin in eastern Poland. Drainage water was sampled at 4 different points. Samples of synthetic air and river water taken from the Bystrzyca River were used for reference. All water samples were tested using an MOS gas sensor array. In order to assess how the e-nose performed in screening and discriminating/preliminarily classifying and grouping samples, their properties were tested using reference methods and assessing surface water quality. The PCA method, Kohonen's SOM with superimposed cluster boundaries by McQuitty's method, random forest and MLP neural network were used to visualize and classify the multivariate data.</p><p><strong>Results: </strong>The visualization and multidimensionality reduction methods (PCA and SOM) did not enable to clearly distinguish the observations from different drainage water samples. The supervised random forest and MLP methods coped with the classification of samples much better, achieving 84.3% and 87.6% correct classifications on the test set, respectively.</p><p><strong>Conclusions: </strong>Statistical analysis of the chemical properties of the samples showed that even reference tests are unable to clearly distinguish the samples in terms of a single parameter. However, the e-nose method makes it possible to distinguish these samples from a reference sample derived from river water and a clean air sample.</p>","PeriodicalId":50970,"journal":{"name":"Annals of Agricultural and Environmental Medicine","volume":"31 4","pages":"513-519"},"PeriodicalIF":1.3000,"publicationDate":"2024-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of Agricultural and Environmental Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.26444/aaem/196101","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/12/2 0:00:00","PubModel":"Epub","JCR":"Q4","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Objective: The aim of the study is to verify whether the electronic nose system - an array of 17 gas sensors with a signal analysis system - is a useful tool for the classification and preliminary assessment of the quality of drainage water.
Material and methods: Water samples for analysis were collected in the Park Ludowy (People's Park), located next to the Bystrzyca River, near the city center of Lublin in eastern Poland. Drainage water was sampled at 4 different points. Samples of synthetic air and river water taken from the Bystrzyca River were used for reference. All water samples were tested using an MOS gas sensor array. In order to assess how the e-nose performed in screening and discriminating/preliminarily classifying and grouping samples, their properties were tested using reference methods and assessing surface water quality. The PCA method, Kohonen's SOM with superimposed cluster boundaries by McQuitty's method, random forest and MLP neural network were used to visualize and classify the multivariate data.
Results: The visualization and multidimensionality reduction methods (PCA and SOM) did not enable to clearly distinguish the observations from different drainage water samples. The supervised random forest and MLP methods coped with the classification of samples much better, achieving 84.3% and 87.6% correct classifications on the test set, respectively.
Conclusions: Statistical analysis of the chemical properties of the samples showed that even reference tests are unable to clearly distinguish the samples in terms of a single parameter. However, the e-nose method makes it possible to distinguish these samples from a reference sample derived from river water and a clean air sample.
期刊介绍:
All papers within the scope indicated by the following sections of the journal may be submitted:
Biological agents posing occupational risk in agriculture, forestry, food industry and wood industry and diseases caused by these agents (zoonoses, allergic and immunotoxic diseases).
Health effects of chemical pollutants in agricultural areas , including occupational and non-occupational effects of agricultural chemicals (pesticides, fertilizers) and effects of industrial disposal (heavy metals, sulphur, etc.) contaminating the atmosphere, soil and water.
Exposure to physical hazards associated with the use of machinery in agriculture and forestry: noise, vibration, dust.
Prevention of occupational diseases in agriculture, forestry, food industry and wood industry.
Work-related accidents and injuries in agriculture, forestry, food industry and wood industry: incidence, causes, social aspects and prevention.
State of the health of rural communities depending on various factors: social factors, accessibility of medical care, etc.