{"title":"Improvement in classification capabilities of surface water samples based on analysis of multidimensional data from gas sensor array.","authors":"Magdalena Piłat-Rożek, Grzegorz Łagód","doi":"10.26444/aaem/206945","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction and objective: </strong>It has been proven that e-noses can successfully differentiate between drainage and river water samples. However, it was supposed that the classification accuracy in the previous article from the series could have been refined. The aim of the article was to improve the classification accuracy of surface water samples analyzed with a gas sensor array.</p><p><strong>Material and methods: </strong>The multidimensional data on which the machine learning models were trained was derived from river water, drainage water and synthetic air samples measured using an array comprising 17 gas sensors. In this research, the unsupervised t-SNE and k-medians were used for dimensionality reduction, visualization on 2-dimensional plane, and clustering. Subsequently, supervised classificators XGBoost and AdaBoost.M1 were trained and compared with regard to the achieved quality of classification of objects into correct classes.</p><p><strong>Results: </strong>The visualization using t-SNE and clustering with k-medians clearly distinguished the observations from the water sample and different drainage samples. The applied supervised machine learning methods achieved 88.8% and 89.2% correct classifications on the test set for the XGBoost and AdaBoost.M1 models, respectively.</p><p><strong>Conclusions: </strong>Despite the absence of statistical significance in differences of medians in most of the multiple comparisons between sample groups for all the classical indicators, the electronic nose allows differentiating and correctly classifying surface water samples with high accuracy.</p>","PeriodicalId":520557,"journal":{"name":"Annals of agricultural and environmental medicine : AAEM","volume":"32 2","pages":"222-229"},"PeriodicalIF":0.0000,"publicationDate":"2025-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of agricultural and environmental medicine : AAEM","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.26444/aaem/206945","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/6/25 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Introduction and objective: It has been proven that e-noses can successfully differentiate between drainage and river water samples. However, it was supposed that the classification accuracy in the previous article from the series could have been refined. The aim of the article was to improve the classification accuracy of surface water samples analyzed with a gas sensor array.
Material and methods: The multidimensional data on which the machine learning models were trained was derived from river water, drainage water and synthetic air samples measured using an array comprising 17 gas sensors. In this research, the unsupervised t-SNE and k-medians were used for dimensionality reduction, visualization on 2-dimensional plane, and clustering. Subsequently, supervised classificators XGBoost and AdaBoost.M1 were trained and compared with regard to the achieved quality of classification of objects into correct classes.
Results: The visualization using t-SNE and clustering with k-medians clearly distinguished the observations from the water sample and different drainage samples. The applied supervised machine learning methods achieved 88.8% and 89.2% correct classifications on the test set for the XGBoost and AdaBoost.M1 models, respectively.
Conclusions: Despite the absence of statistical significance in differences of medians in most of the multiple comparisons between sample groups for all the classical indicators, the electronic nose allows differentiating and correctly classifying surface water samples with high accuracy.