Mathew Wyatt , Sharyn Hickey , Ben Radford , Manuel Gonzalez-Rivero , Nader Boutros , Nikolaus Callow , Nicole Ryan , Arjun Chennu , Mohammed Bennamoun , James Gilmour
{"title":"Safe AI for coral reefs: Benchmarking out-of-distribution detection algorithms for coral reef image surveys","authors":"Mathew Wyatt , Sharyn Hickey , Ben Radford , Manuel Gonzalez-Rivero , Nader Boutros , Nikolaus Callow , Nicole Ryan , Arjun Chennu , Mohammed Bennamoun , James Gilmour","doi":"10.1016/j.ecoinf.2025.103207","DOIUrl":null,"url":null,"abstract":"<div><div>Although deep learning has demonstrated significant advances in qualitative domains, deep learning algorithms remain poor at quantifying the uncertainty of their predictions. This is especially true when applied in domains where there is data shift outside of which an algorithm has been trained. This has major implications for the use of deep learning in accurately estimating change in environmental monitoring applications. In the case of image classification for coral reef habitats, time series imagery is rarely consistent due to changing environmental conditions, differing sensors and inconsistencies in data capture, which means traditional machine learning metrics simply do not work when applied to new out of distribution datasets.<ul><li><span>1.</span><span><div>For this reason, we benchmark the latest state-of-the-art OOD (Out Of Distribution) detection algorithms on publicly available coral reef image datasets, and evaluate histogram intersection of outlier scores as an indicator for human intervention.</div></span></li><li><span>2.</span><span><div>We show with a comparative analysis that the performance of OOD detection algorithms is variable, and highly dependent on in-distribution and out-of-distribution data composition. We show that KNN (K-Nearest Neighbour) distance was the most consistent across datasets, followed by Virtual-logit matching (ViM).</div></span></li><li><span>3.</span><span><div>This research shows a compelling example of how a handful of openly available algorithms can easily be used as a complimentary indicator alongside confidence (Softmax probability), in turn providing more efficient and safe deployment of deep learning for rapid reporting of coral reef habitats.</div></span></li></ul></div></div>","PeriodicalId":51024,"journal":{"name":"Ecological Informatics","volume":"90 ","pages":"Article 103207"},"PeriodicalIF":5.8000,"publicationDate":"2025-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ecological Informatics","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S157495412500216X","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ECOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Although deep learning has demonstrated significant advances in qualitative domains, deep learning algorithms remain poor at quantifying the uncertainty of their predictions. This is especially true when applied in domains where there is data shift outside of which an algorithm has been trained. This has major implications for the use of deep learning in accurately estimating change in environmental monitoring applications. In the case of image classification for coral reef habitats, time series imagery is rarely consistent due to changing environmental conditions, differing sensors and inconsistencies in data capture, which means traditional machine learning metrics simply do not work when applied to new out of distribution datasets.
1.
For this reason, we benchmark the latest state-of-the-art OOD (Out Of Distribution) detection algorithms on publicly available coral reef image datasets, and evaluate histogram intersection of outlier scores as an indicator for human intervention.
2.
We show with a comparative analysis that the performance of OOD detection algorithms is variable, and highly dependent on in-distribution and out-of-distribution data composition. We show that KNN (K-Nearest Neighbour) distance was the most consistent across datasets, followed by Virtual-logit matching (ViM).
3.
This research shows a compelling example of how a handful of openly available algorithms can easily be used as a complimentary indicator alongside confidence (Softmax probability), in turn providing more efficient and safe deployment of deep learning for rapid reporting of coral reef habitats.
期刊介绍:
The journal Ecological Informatics is devoted to the publication of high quality, peer-reviewed articles on all aspects of computational ecology, data science and biogeography. The scope of the journal takes into account the data-intensive nature of ecology, the growing capacity of information technology to access, harness and leverage complex data as well as the critical need for informing sustainable management in view of global environmental and climate change.
The nature of the journal is interdisciplinary at the crossover between ecology and informatics. It focuses on novel concepts and techniques for image- and genome-based monitoring and interpretation, sensor- and multimedia-based data acquisition, internet-based data archiving and sharing, data assimilation, modelling and prediction of ecological data.