Djuradj Milošević, Aleksandar Milosavljević, Predrag Simović, Aleksandra Trajković, Andrew Medeiros, Dimitrija Savić-Zdravković, Katarina Stojanović, Tijana Kostić, Bratislav Predić
{"title":"Unsupervised deep clustering as a tool for the identification of dark taxa in biomonitoring.","authors":"Djuradj Milošević, Aleksandar Milosavljević, Predrag Simović, Aleksandra Trajković, Andrew Medeiros, Dimitrija Savić-Zdravković, Katarina Stojanović, Tijana Kostić, Bratislav Predić","doi":"10.1007/s10661-025-14293-y","DOIUrl":null,"url":null,"abstract":"<p><p>The identification of aquatic macroinvertebrates, particularly dark taxa like Chironomidae, due to their complex morphological features and unresolved taxonomy hinder the efficiency of routine biomonitoring. This study proposes an unsupervised deep clustering approach using β-variational autoencoders (β-VAEs) to identify chironomid larvae morphotypes in a completely unsupervised manner. A dataset of 5365 chironomid specimens from 37 taxa was used to develop and test multiple β-VAE models. The number of latent features (20-80) and the β hyperparameter (0.1-10) were systematically varied to optimize unsupervised classification accuracy. Loss analysis revealed that models with fewer latent features exhibited better feature disentanglement and reduced total correlation (TC) loss, enhancing the unsupervised classification of chironomid taxa. The model with 30 latent features and β = 0.1 outperformed others, achieving the highest Normalized Mutual Information (NMI) scores for clustering with K-means (0.4438) and Louvain (0.4813) algorithms. Entropy analysis revealed that species such as Diamesa insignipes, Rheocricotopus fuscipes, and Tvetenia tshernovskii posed classification challenges for the β-VAE model, as specimens from the same species were often assigned to multiple clusters. β-VAE showed in the present study the potential of unsupervised clustering for taxonomic identification, offering a scalable approach for biomonitoring programs. By enabling the identification in unsupervised manner, this study contributes to the inclusion of dark taxa in bioassessment and the exploration of cryptic diversity, advancing biomonitoring and biodiversity conservation.</p>","PeriodicalId":544,"journal":{"name":"Environmental Monitoring and Assessment","volume":"197 8","pages":"858"},"PeriodicalIF":3.0000,"publicationDate":"2025-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Environmental Monitoring and Assessment","FirstCategoryId":"93","ListUrlMain":"https://doi.org/10.1007/s10661-025-14293-y","RegionNum":4,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
The identification of aquatic macroinvertebrates, particularly dark taxa like Chironomidae, due to their complex morphological features and unresolved taxonomy hinder the efficiency of routine biomonitoring. This study proposes an unsupervised deep clustering approach using β-variational autoencoders (β-VAEs) to identify chironomid larvae morphotypes in a completely unsupervised manner. A dataset of 5365 chironomid specimens from 37 taxa was used to develop and test multiple β-VAE models. The number of latent features (20-80) and the β hyperparameter (0.1-10) were systematically varied to optimize unsupervised classification accuracy. Loss analysis revealed that models with fewer latent features exhibited better feature disentanglement and reduced total correlation (TC) loss, enhancing the unsupervised classification of chironomid taxa. The model with 30 latent features and β = 0.1 outperformed others, achieving the highest Normalized Mutual Information (NMI) scores for clustering with K-means (0.4438) and Louvain (0.4813) algorithms. Entropy analysis revealed that species such as Diamesa insignipes, Rheocricotopus fuscipes, and Tvetenia tshernovskii posed classification challenges for the β-VAE model, as specimens from the same species were often assigned to multiple clusters. β-VAE showed in the present study the potential of unsupervised clustering for taxonomic identification, offering a scalable approach for biomonitoring programs. By enabling the identification in unsupervised manner, this study contributes to the inclusion of dark taxa in bioassessment and the exploration of cryptic diversity, advancing biomonitoring and biodiversity conservation.
期刊介绍:
Environmental Monitoring and Assessment emphasizes technical developments and data arising from environmental monitoring and assessment, the use of scientific principles in the design of monitoring systems at the local, regional and global scales, and the use of monitoring data in assessing the consequences of natural resource management actions and pollution risks to man and the environment.