Multi-level Feature Reweighting and Fusion for Instance Segmentation

Xuan-Thuy Vo, T. Tran, Duy-Linh Nguyen, K. Jo
{"title":"Multi-level Feature Reweighting and Fusion for Instance Segmentation","authors":"Xuan-Thuy Vo, T. Tran, Duy-Linh Nguyen, K. Jo","doi":"10.1109/INDIN51773.2022.9976099","DOIUrl":null,"url":null,"abstract":"Accurate instance segmentation requires high-resolution features for performing a dense pixel-wise prediction task. However, using high-resolution feature maps results in highly expensive model complexity and ineffective receptive fields. To overcome the problems of high-resolution features, conventional methods explore multi-level feature fusion that exchanges the information between low-level features at earlier layers and high-level features at top layers. Both low and high information is extracted by the hierarchical backbone network where high-level features contain more semantic cues and low-level features encompass more specific patterns. Thus, adopting these features to the training segmentation model is necessary, and designing a more efficient multi-level feature fusion is crucial. Existing methods balance such information by using top-down and bottom-up pathway connections with more inefficient convolution layers to produce richer multi-scale features. In this work, we contribute two folds: (1) a simple but effective multilevel feature reweighting layer is proposed to strengthen deep high-level features based on channel reweighting generated from multiple features of the backbone, and (2) an efficient fusion block is proposed to process low-resolution features in a depth-to-spatial manner and combine enhanced multi-level features together. These designs enable the segmentation models to predict instance kernels for mask generation on high-level feature maps. To verify the effectiveness of the proposed method, we conduct experiments on the challenging benchmark dataset MS-COCO. Surprisingly, our simple network outperforms the baseline in both accuracy and inference speed. More specifically, we achieve 35.4% APmask at 19.5 FPS on a GPU device, becoming a state-of-the-art instance segmentation method.","PeriodicalId":359190,"journal":{"name":"2022 IEEE 20th International Conference on Industrial Informatics (INDIN)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 20th International Conference on Industrial Informatics (INDIN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INDIN51773.2022.9976099","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Accurate instance segmentation requires high-resolution features for performing a dense pixel-wise prediction task. However, using high-resolution feature maps results in highly expensive model complexity and ineffective receptive fields. To overcome the problems of high-resolution features, conventional methods explore multi-level feature fusion that exchanges the information between low-level features at earlier layers and high-level features at top layers. Both low and high information is extracted by the hierarchical backbone network where high-level features contain more semantic cues and low-level features encompass more specific patterns. Thus, adopting these features to the training segmentation model is necessary, and designing a more efficient multi-level feature fusion is crucial. Existing methods balance such information by using top-down and bottom-up pathway connections with more inefficient convolution layers to produce richer multi-scale features. In this work, we contribute two folds: (1) a simple but effective multilevel feature reweighting layer is proposed to strengthen deep high-level features based on channel reweighting generated from multiple features of the backbone, and (2) an efficient fusion block is proposed to process low-resolution features in a depth-to-spatial manner and combine enhanced multi-level features together. These designs enable the segmentation models to predict instance kernels for mask generation on high-level feature maps. To verify the effectiveness of the proposed method, we conduct experiments on the challenging benchmark dataset MS-COCO. Surprisingly, our simple network outperforms the baseline in both accuracy and inference speed. More specifically, we achieve 35.4% APmask at 19.5 FPS on a GPU device, becoming a state-of-the-art instance segmentation method.
实例分割的多级特征重加权与融合
准确的实例分割需要高分辨率的特征来执行密集的逐像素预测任务。然而,使用高分辨率的特征映射会导致模型复杂性的高代价和无效的接受域。为了克服高分辨率特征的问题,传统方法探索多层次特征融合,即在较早层的低级特征与顶层的高级特征之间交换信息。低信息和高信息都是通过分层骨干网提取的,其中高层特征包含更多的语义线索,而低层特征包含更具体的模式。因此,将这些特征应用到训练分割模型中是必要的,设计一种更高效的多层次特征融合是至关重要的。现有方法通过使用自顶向下和自底向上的路径连接以及更低效的卷积层来平衡这些信息,从而产生更丰富的多尺度特征。在本工作中,我们提出了两个方面的贡献:(1)提出了一种简单而有效的多层特征重加权层,基于主干网的多个特征产生的通道重加权来增强深度高级特征;(2)提出了一种高效的融合块,以深度到空间的方式处理低分辨率特征,并将增强的多层特征组合在一起。这些设计使分割模型能够预测在高级特征映射上生成掩码的实例核。为了验证该方法的有效性,我们在具有挑战性的基准数据集MS-COCO上进行了实验。令人惊讶的是,我们的简单网络在准确率和推理速度上都优于基线。更具体地说,我们在GPU设备上以19.5 FPS实现了35.4%的APmask,成为最先进的实例分割方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信