Coverage estimation of benthic habitat features by semantic segmentation of underwater imagery from South-eastern Baltic reefs using deep learning models

IF 2.6 3区地球科学 Q2 OCEANOGRAPHY

Oceanologia Pub Date : 2024-04-01 DOI:10.1016/j.oceano.2023.12.004

Andrius Šiaulys , Evaldas Vaičiukynas , Saulė Medelytė , Kazimieras Buškus

{"title":"Coverage estimation of benthic habitat features by semantic segmentation of underwater imagery from South-eastern Baltic reefs using deep learning models","authors":"Andrius Šiaulys , Evaldas Vaičiukynas , Saulė Medelytė , Kazimieras Buškus","doi":"10.1016/j.oceano.2023.12.004","DOIUrl":null,"url":null,"abstract":"<div><p>Underwater imagery (UI) is an important and sometimes the only tool for mapping hard-bottom habitats. With the development of new camera systems, from hand-held or simple “drop-down” cameras to ROV/AUV-mounted video systems, video data collection has increased considerably. However, the processing and analysing of vast amounts of imagery can become very labour-intensive, thus making it ineffective both time-wise and financially. This task could be simplified if the processes or their intermediate steps could be done automatically. Luckily, the rise of AI applications for automatic image analysis tasks in the last decade has empowered researchers with robust and effective tools. In this study, two ways to make UI analysis more efficient were tested with eight dominant visual features of the Southeastern Baltic reefs: 1) the simplification of video processing and expert annotation efforts by skipping the video mosaicking step and reducing the number of frames analysed; 2) the application of semantic segmentation of UI using deep learning models. The results showed that the annotation of individual frames provides similar results compared to 2D mosaics; moreover, the reduction of frames by 2–3 times resulted in only minor differences from the baseline. Semantic segmentation using the PSPNet model as the deep learning architecture was extensively evaluated, applying three variants of validation. The accuracy of segmentation, as measured by the intersection-over-union, was mediocre; however, estimates of visual coverage percentages were fair: the difference between the expert annotations and model-predicted segmentation was less than 6–8%, which could be considered an encouraging result.</p></div>","PeriodicalId":54694,"journal":{"name":"Oceanologia","volume":"66 2","pages":"Pages 286-298"},"PeriodicalIF":2.6000,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0078323423000933/pdfft?md5=0d9a8172bcf0bc44b50a5d408bd640ba&pid=1-s2.0-S0078323423000933-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Oceanologia","FirstCategoryId":"89","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0078323423000933","RegionNum":3,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"OCEANOGRAPHY","Score":null,"Total":0}

引用次数: 0

Abstract

Underwater imagery (UI) is an important and sometimes the only tool for mapping hard-bottom habitats. With the development of new camera systems, from hand-held or simple “drop-down” cameras to ROV/AUV-mounted video systems, video data collection has increased considerably. However, the processing and analysing of vast amounts of imagery can become very labour-intensive, thus making it ineffective both time-wise and financially. This task could be simplified if the processes or their intermediate steps could be done automatically. Luckily, the rise of AI applications for automatic image analysis tasks in the last decade has empowered researchers with robust and effective tools. In this study, two ways to make UI analysis more efficient were tested with eight dominant visual features of the Southeastern Baltic reefs: 1) the simplification of video processing and expert annotation efforts by skipping the video mosaicking step and reducing the number of frames analysed; 2) the application of semantic segmentation of UI using deep learning models. The results showed that the annotation of individual frames provides similar results compared to 2D mosaics; moreover, the reduction of frames by 2–3 times resulted in only minor differences from the baseline. Semantic segmentation using the PSPNet model as the deep learning architecture was extensively evaluated, applying three variants of validation. The accuracy of segmentation, as measured by the intersection-over-union, was mediocre; however, estimates of visual coverage percentages were fair: the difference between the expert annotations and model-predicted segmentation was less than 6–8%, which could be considered an encouraging result.

查看原文本刊更多论文

利用深度学习模型对波罗的海东南部珊瑚礁的水下图像进行语义分割，从而估算海底生境特征的覆盖范围

水下成像（UI）是绘制硬底栖息地地图的重要工具，有时甚至是唯一工具。随着新型摄像系统的发展，从手持式或简单的 "下拉式 "摄像机到安装在 ROV/AUV 上的视频系统，视频数据的收集量大大增加。然而，处理和分析大量图像的工作可能非常耗费人力，因此在时间上和经济上都是无效的。如果这些过程或中间步骤可以自动完成，这项任务就可以简化。幸运的是，在过去十年中，用于自动图像分析任务的人工智能应用不断兴起，为研究人员提供了强大而有效的工具。在这项研究中，利用波罗的海东南部珊瑚礁的八个主要视觉特征测试了两种提高用户界面分析效率的方法：1）通过跳过视频镶嵌步骤和减少分析帧数来简化视频处理和专家注释工作；2）利用深度学习模型对用户界面进行语义分割。结果表明，与二维马赛克相比，单个帧的注释可提供相似的结果；此外，将帧数减少 2-3 倍后，与基线的差异也很小。使用 PSPNet 模型作为深度学习架构，对语义分割进行了广泛评估，采用了三种验证方法。以 "交叉-重合"（intersection-over-union）来衡量，分割的准确性一般；不过，对视觉覆盖百分比的估算结果尚可：专家注释与模型预测的分割结果之间的差异小于 6-8%，这可以说是一个令人鼓舞的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Oceanologia 地学-海洋学

CiteScore

5.30

自引率

6.90%

发文量

审稿时长

146 days

期刊介绍： Oceanologia is an international journal that publishes results of original research in the field of marine sciences with emphasis on the European seas.