Mingde Yao , King Man Tam , Menglu Wang , Lingen Li , Rei Kawakami
{"title":"Language-guided reasoning segmentation for underwater images","authors":"Mingde Yao , King Man Tam , Menglu Wang , Lingen Li , Rei Kawakami","doi":"10.1016/j.inffus.2025.103177","DOIUrl":null,"url":null,"abstract":"<div><div>In this paper, we introduce Language-Guided Reasoning Segmentation (LGRS), a framework that leverages human language instructions to guide underwater image segmentation. Unlike existing methods, that rely solely on visual cues or predefined categories, LGRS enables segmentation at underwater images based on detailed, context-aware textual descriptions, allowing it to tackle more challenging scenarios, such as distinguishing visually similar objects or identifying species from complex queries. To facilitate the development and evaluation of this approach, we create an underwater image-language segmentation dataset, the first of its kind, which pairs underwater images with detailed textual descriptions and corresponding segmentation masks. This dataset provides a foundation for training models capable of processing both visual and linguistic inputs simultaneously. Furthermore, LGRS incorporates reasoning capabilities through large language models, enabling the system to interpret complex relationships between objects in the scene and perform accurate segmentation in dynamic underwater environments. Notably, our method also demonstrates strong zero-shot segmentation capabilities, enabling the model to generalize to unseen categories without additional training. Experimental results show that LGRS outperforms existing underwater image segmentation methods in both accuracy and flexibility, offering a foundation for further advancements.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"122 ","pages":"Article 103177"},"PeriodicalIF":14.7000,"publicationDate":"2025-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253525002507","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Language-guided reasoning segmentation for underwater images
In this paper, we introduce Language-Guided Reasoning Segmentation (LGRS), a framework that leverages human language instructions to guide underwater image segmentation. Unlike existing methods, that rely solely on visual cues or predefined categories, LGRS enables segmentation at underwater images based on detailed, context-aware textual descriptions, allowing it to tackle more challenging scenarios, such as distinguishing visually similar objects or identifying species from complex queries. To facilitate the development and evaluation of this approach, we create an underwater image-language segmentation dataset, the first of its kind, which pairs underwater images with detailed textual descriptions and corresponding segmentation masks. This dataset provides a foundation for training models capable of processing both visual and linguistic inputs simultaneously. Furthermore, LGRS incorporates reasoning capabilities through large language models, enabling the system to interpret complex relationships between objects in the scene and perform accurate segmentation in dynamic underwater environments. Notably, our method also demonstrates strong zero-shot segmentation capabilities, enabling the model to generalize to unseen categories without additional training. Experimental results show that LGRS outperforms existing underwater image segmentation methods in both accuracy and flexibility, offering a foundation for further advancements.
期刊介绍:
Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.