VAGen:水体分割与提示的视觉上下文学习

Jiapei Zhao, Nobuyoshi Yabuki, Tomohiro Fukuda
{"title":"VAGen:水体分割与提示的视觉上下文学习","authors":"Jiapei Zhao,&nbsp;Nobuyoshi Yabuki,&nbsp;Tomohiro Fukuda","doi":"10.1007/s43503-024-00042-6","DOIUrl":null,"url":null,"abstract":"<div><p>Effective water management and flood prevention are critical challenges encountered by both urban and rural areas, necessitating precise and prompt monitoring of waterbodies. As a fundamental step in the monitoring process, waterbody segmentation involves precisely delineating waterbody boundaries from imagery. Previous research using satellite images often lacks the resolution and contextual detail needed for local-scale analysis. In response to these challenges, this study seeks to address them by leveraging common natural images that are more easily accessible and provide higher resolution and more contextual information compared to satellite images. However, the segmentation of waterbodies from ordinary images faces several obstacles, including variations in lighting, occlusions from objects like trees and buildings, and reflections on the water surface, all of which can mislead algorithms. Additionally, the diverse shapes and textures of waterbodies, alongside complex backgrounds, further complicate this task. While large-scale vision models have typically been leveraged for their generalizability across various downstream tasks that are pre-trained on large datasets, their application to waterbody segmentation from ground-level images remains underexplored. Hence, this research proposed the Visual Aquatic Generalist (VAGen) as a countermeasure. Being a lightweight model for waterbody segmentation inspired by visual In-Context Learning (ICL) and Visual Prompting (VP), VAGen refines large visual models by innovatively adding learnable perturbations to enhance the quality of prompts in ICL. As demonstrated by the experimental results, VAGen demonstrated a significant increase in the mean Intersection over Union (mIoU) metric, showing a 22.38% enhancement when compared to the baseline model that lacked the integration of learnable prompts. Moreover, VAGen surpassed the current state-of-the-art (SOTA) task-specific models designed for waterbody segmentation by 6.20%. The performance evaluation and analysis of VAGen indicated its capacity to substantially reduce the number of trainable parameters and computational overhead, and proved its feasibility to be deployed on cost-limited devices including unmanned aerial vehicles (UAVs) and mobile computing platforms. This study thereby makes a valuable contribution to the field of computer vision, offering practical solutions for engineering applications related to urban flood monitoring, agricultural water resource management, and environmental conservation efforts.</p></div>","PeriodicalId":72138,"journal":{"name":"AI in civil engineering","volume":"3 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s43503-024-00042-6.pdf","citationCount":"0","resultStr":"{\"title\":\"VAGen: waterbody segmentation with prompting for visual in-context learning\",\"authors\":\"Jiapei Zhao,&nbsp;Nobuyoshi Yabuki,&nbsp;Tomohiro Fukuda\",\"doi\":\"10.1007/s43503-024-00042-6\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Effective water management and flood prevention are critical challenges encountered by both urban and rural areas, necessitating precise and prompt monitoring of waterbodies. As a fundamental step in the monitoring process, waterbody segmentation involves precisely delineating waterbody boundaries from imagery. Previous research using satellite images often lacks the resolution and contextual detail needed for local-scale analysis. In response to these challenges, this study seeks to address them by leveraging common natural images that are more easily accessible and provide higher resolution and more contextual information compared to satellite images. However, the segmentation of waterbodies from ordinary images faces several obstacles, including variations in lighting, occlusions from objects like trees and buildings, and reflections on the water surface, all of which can mislead algorithms. Additionally, the diverse shapes and textures of waterbodies, alongside complex backgrounds, further complicate this task. While large-scale vision models have typically been leveraged for their generalizability across various downstream tasks that are pre-trained on large datasets, their application to waterbody segmentation from ground-level images remains underexplored. Hence, this research proposed the Visual Aquatic Generalist (VAGen) as a countermeasure. Being a lightweight model for waterbody segmentation inspired by visual In-Context Learning (ICL) and Visual Prompting (VP), VAGen refines large visual models by innovatively adding learnable perturbations to enhance the quality of prompts in ICL. As demonstrated by the experimental results, VAGen demonstrated a significant increase in the mean Intersection over Union (mIoU) metric, showing a 22.38% enhancement when compared to the baseline model that lacked the integration of learnable prompts. Moreover, VAGen surpassed the current state-of-the-art (SOTA) task-specific models designed for waterbody segmentation by 6.20%. The performance evaluation and analysis of VAGen indicated its capacity to substantially reduce the number of trainable parameters and computational overhead, and proved its feasibility to be deployed on cost-limited devices including unmanned aerial vehicles (UAVs) and mobile computing platforms. This study thereby makes a valuable contribution to the field of computer vision, offering practical solutions for engineering applications related to urban flood monitoring, agricultural water resource management, and environmental conservation efforts.</p></div>\",\"PeriodicalId\":72138,\"journal\":{\"name\":\"AI in civil engineering\",\"volume\":\"3 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-12-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://link.springer.com/content/pdf/10.1007/s43503-024-00042-6.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"AI in civil engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s43503-024-00042-6\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"AI in civil engineering","FirstCategoryId":"1085","ListUrlMain":"https://link.springer.com/article/10.1007/s43503-024-00042-6","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

有效的水资源管理和防洪是城市和农村地区面临的重大挑战,需要对水体进行精确和及时的监测。水体分割是水体监测过程中的一个基本步骤,它涉及到从图像中精确划定水体边界。以前使用卫星图像的研究往往缺乏局部尺度分析所需的分辨率和背景细节。为了应对这些挑战,本研究试图通过利用常见的自然图像来解决这些问题,这些图像与卫星图像相比更容易获取,并提供更高的分辨率和更多的上下文信息。然而,从普通图像中分割水体面临着一些障碍,包括光照的变化、树木和建筑物等物体的遮挡以及水面上的反射,所有这些都会误导算法。此外,水体的不同形状和纹理,以及复杂的背景,进一步复杂化了这项任务。虽然大规模视觉模型通常被用于在大型数据集上预训练的各种下游任务的通用性,但它们在从地面图像分割水体方面的应用仍未得到充分探索。因此,本研究提出了视觉水生通才(VAGen)作为对策。VAGen是一个受视觉上下文学习(ICL)和视觉提示(VP)启发的轻量级水体分割模型,通过创新地添加可学习的扰动来改进大型视觉模型,以提高ICL中提示的质量。实验结果表明,与缺乏可学习提示集成的基线模型相比,VAGen显示出显著增加的平均交叉口/联合(mIoU)度量,显示出22.38%的增强。此外,VAGen比目前最先进的(SOTA)针对水体分割设计的任务特定模型高出6.20%。VAGen的性能评估和分析表明,它能够大幅减少可训练参数的数量和计算开销,并证明了其部署在成本有限的设备(包括无人机)和移动计算平台上的可行性。因此,本研究对计算机视觉领域做出了宝贵的贡献,为城市洪水监测、农业水资源管理和环境保护等工程应用提供了实用的解决方案。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
VAGen: waterbody segmentation with prompting for visual in-context learning

Effective water management and flood prevention are critical challenges encountered by both urban and rural areas, necessitating precise and prompt monitoring of waterbodies. As a fundamental step in the monitoring process, waterbody segmentation involves precisely delineating waterbody boundaries from imagery. Previous research using satellite images often lacks the resolution and contextual detail needed for local-scale analysis. In response to these challenges, this study seeks to address them by leveraging common natural images that are more easily accessible and provide higher resolution and more contextual information compared to satellite images. However, the segmentation of waterbodies from ordinary images faces several obstacles, including variations in lighting, occlusions from objects like trees and buildings, and reflections on the water surface, all of which can mislead algorithms. Additionally, the diverse shapes and textures of waterbodies, alongside complex backgrounds, further complicate this task. While large-scale vision models have typically been leveraged for their generalizability across various downstream tasks that are pre-trained on large datasets, their application to waterbody segmentation from ground-level images remains underexplored. Hence, this research proposed the Visual Aquatic Generalist (VAGen) as a countermeasure. Being a lightweight model for waterbody segmentation inspired by visual In-Context Learning (ICL) and Visual Prompting (VP), VAGen refines large visual models by innovatively adding learnable perturbations to enhance the quality of prompts in ICL. As demonstrated by the experimental results, VAGen demonstrated a significant increase in the mean Intersection over Union (mIoU) metric, showing a 22.38% enhancement when compared to the baseline model that lacked the integration of learnable prompts. Moreover, VAGen surpassed the current state-of-the-art (SOTA) task-specific models designed for waterbody segmentation by 6.20%. The performance evaluation and analysis of VAGen indicated its capacity to substantially reduce the number of trainable parameters and computational overhead, and proved its feasibility to be deployed on cost-limited devices including unmanned aerial vehicles (UAVs) and mobile computing platforms. This study thereby makes a valuable contribution to the field of computer vision, offering practical solutions for engineering applications related to urban flood monitoring, agricultural water resource management, and environmental conservation efforts.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信