MFC-Net: Amodal instance segmentation with multi-path fusion and context-awareness

IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Yunfei Yang , Hongwei Deng , Yichun Wu
{"title":"MFC-Net: Amodal instance segmentation with multi-path fusion and context-awareness","authors":"Yunfei Yang ,&nbsp;Hongwei Deng ,&nbsp;Yichun Wu","doi":"10.1016/j.imavis.2025.105539","DOIUrl":null,"url":null,"abstract":"<div><div>Amodal instance segmentation refers to sensing the entire instance in an image, thereby segmenting the visible parts of an object and the regions that may be masked. However, existing amodal instance segmentation methods predict rough mask edges and perform poorly in segmenting objects with significant size differences. In addition, the occlusion environment greatly limits the performance of the model. To address the above problems, this work proposes an amodal instance segmentation method called MFC-Net to accurately segment objects in an image. For the rough prediction of mask edges, the model introduces the multi-path transformer structure to obtain finer object semantic features and boundary information, which improves the accuracy of edge region segmentation. For the problem of poor segmentation of object instances with significant size differences, we design an adaptive feature fusion module AFF, which dynamically captures the scale changes related to object size and fuses the multi-scale semantic feature information, so that the model obtains a receptive field adapted to the object size. To address the poor performance of segmentation in the occlusion environment, we designed the context-aware mask segmentation module CMS in the prediction module to make a preliminary prediction of the object’s amodal region. The module enhances the amodal perception of the model by modeling the long-range dependencies of the objects and capturing the contextual information of the occluded part of the object. Compared with the state-of-the-art methods, the MFC-Net proposed in this paper achieves a mAP of 73.3% on the D2SA dataset and 33.9% and 36.9% on the KINS and COCOA-cls datasets, respectively. Moreover, MFC-Net can produce complete and detailed amodal masks.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"158 ","pages":"Article 105539"},"PeriodicalIF":4.2000,"publicationDate":"2025-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885625001271","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Amodal instance segmentation refers to sensing the entire instance in an image, thereby segmenting the visible parts of an object and the regions that may be masked. However, existing amodal instance segmentation methods predict rough mask edges and perform poorly in segmenting objects with significant size differences. In addition, the occlusion environment greatly limits the performance of the model. To address the above problems, this work proposes an amodal instance segmentation method called MFC-Net to accurately segment objects in an image. For the rough prediction of mask edges, the model introduces the multi-path transformer structure to obtain finer object semantic features and boundary information, which improves the accuracy of edge region segmentation. For the problem of poor segmentation of object instances with significant size differences, we design an adaptive feature fusion module AFF, which dynamically captures the scale changes related to object size and fuses the multi-scale semantic feature information, so that the model obtains a receptive field adapted to the object size. To address the poor performance of segmentation in the occlusion environment, we designed the context-aware mask segmentation module CMS in the prediction module to make a preliminary prediction of the object’s amodal region. The module enhances the amodal perception of the model by modeling the long-range dependencies of the objects and capturing the contextual information of the occluded part of the object. Compared with the state-of-the-art methods, the MFC-Net proposed in this paper achieves a mAP of 73.3% on the D2SA dataset and 33.9% and 36.9% on the KINS and COCOA-cls datasets, respectively. Moreover, MFC-Net can produce complete and detailed amodal masks.
MFC-Net:具有多路径融合和上下文感知的模态实例分割
模态实例分割是指感知图像中的整个实例,从而分割出物体的可见部分和可能被遮挡的区域。然而,现有的模态实例分割方法会预测粗糙的遮挡边缘,在分割大小差异明显的物体时表现不佳。此外,遮挡环境也极大地限制了模型的性能。针对上述问题,本研究提出了一种名为 MFC-Net 的模态实例分割方法,以准确分割图像中的物体。针对遮挡边缘的粗略预测,模型引入多路径变换器结构,获取更精细的物体语义特征和边界信息,提高了边缘区域分割的准确性。针对大小差异明显的物体实例分割效果不佳的问题,我们设计了自适应特征融合模块 AFF,它能动态捕捉与物体大小相关的尺度变化,并融合多尺度语义特征信息,从而使模型获得与物体大小相适应的感受野。针对遮挡环境下分割效果不佳的问题,我们在预测模块中设计了情境感知遮罩分割模块 CMS,对物体的模态区域进行初步预测。该模块通过对物体的长程依赖关系进行建模,并捕捉物体遮挡部分的上下文信息,从而增强了模型的模态感知能力。与最先进的方法相比,本文提出的 MFC-Net 在 D2SA 数据集上的 mAP 达到 73.3%,在 KINS 和 COCOA-cls 数据集上的 mAP 分别达到 33.9% 和 36.9%。此外,MFC-Net 还能生成完整而详细的模态掩码。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Image and Vision Computing
Image and Vision Computing 工程技术-工程:电子与电气
CiteScore
8.50
自引率
8.50%
发文量
143
审稿时长
7.8 months
期刊介绍: Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信