实例分割的多级特征重加权与融合

2022 IEEE 20th International Conference on Industrial Informatics (INDIN) Pub Date : 2022-07-25 DOI:10.1109/INDIN51773.2022.9976099

Xuan-Thuy Vo, T. Tran, Duy-Linh Nguyen, K. Jo

{"title":"实例分割的多级特征重加权与融合","authors":"Xuan-Thuy Vo, T. Tran, Duy-Linh Nguyen, K. Jo","doi":"10.1109/INDIN51773.2022.9976099","DOIUrl":null,"url":null,"abstract":"Accurate instance segmentation requires high-resolution features for performing a dense pixel-wise prediction task. However, using high-resolution feature maps results in highly expensive model complexity and ineffective receptive fields. To overcome the problems of high-resolution features, conventional methods explore multi-level feature fusion that exchanges the information between low-level features at earlier layers and high-level features at top layers. Both low and high information is extracted by the hierarchical backbone network where high-level features contain more semantic cues and low-level features encompass more specific patterns. Thus, adopting these features to the training segmentation model is necessary, and designing a more efficient multi-level feature fusion is crucial. Existing methods balance such information by using top-down and bottom-up pathway connections with more inefficient convolution layers to produce richer multi-scale features. In this work, we contribute two folds: (1) a simple but effective multilevel feature reweighting layer is proposed to strengthen deep high-level features based on channel reweighting generated from multiple features of the backbone, and (2) an efficient fusion block is proposed to process low-resolution features in a depth-to-spatial manner and combine enhanced multi-level features together. These designs enable the segmentation models to predict instance kernels for mask generation on high-level feature maps. To verify the effectiveness of the proposed method, we conduct experiments on the challenging benchmark dataset MS-COCO. Surprisingly, our simple network outperforms the baseline in both accuracy and inference speed. More specifically, we achieve 35.4% APmask at 19.5 FPS on a GPU device, becoming a state-of-the-art instance segmentation method.","PeriodicalId":359190,"journal":{"name":"2022 IEEE 20th International Conference on Industrial Informatics (INDIN)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Multi-level Feature Reweighting and Fusion for Instance Segmentation\",\"authors\":\"Xuan-Thuy Vo, T. Tran, Duy-Linh Nguyen, K. Jo\",\"doi\":\"10.1109/INDIN51773.2022.9976099\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Accurate instance segmentation requires high-resolution features for performing a dense pixel-wise prediction task. However, using high-resolution feature maps results in highly expensive model complexity and ineffective receptive fields. To overcome the problems of high-resolution features, conventional methods explore multi-level feature fusion that exchanges the information between low-level features at earlier layers and high-level features at top layers. Both low and high information is extracted by the hierarchical backbone network where high-level features contain more semantic cues and low-level features encompass more specific patterns. Thus, adopting these features to the training segmentation model is necessary, and designing a more efficient multi-level feature fusion is crucial. Existing methods balance such information by using top-down and bottom-up pathway connections with more inefficient convolution layers to produce richer multi-scale features. In this work, we contribute two folds: (1) a simple but effective multilevel feature reweighting layer is proposed to strengthen deep high-level features based on channel reweighting generated from multiple features of the backbone, and (2) an efficient fusion block is proposed to process low-resolution features in a depth-to-spatial manner and combine enhanced multi-level features together. These designs enable the segmentation models to predict instance kernels for mask generation on high-level feature maps. To verify the effectiveness of the proposed method, we conduct experiments on the challenging benchmark dataset MS-COCO. Surprisingly, our simple network outperforms the baseline in both accuracy and inference speed. More specifically, we achieve 35.4% APmask at 19.5 FPS on a GPU device, becoming a state-of-the-art instance segmentation method.\",\"PeriodicalId\":359190,\"journal\":{\"name\":\"2022 IEEE 20th International Conference on Industrial Informatics (INDIN)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-07-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE 20th International Conference on Industrial Informatics (INDIN)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/INDIN51773.2022.9976099\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 20th International Conference on Industrial Informatics (INDIN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INDIN51773.2022.9976099","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

准确的实例分割需要高分辨率的特征来执行密集的逐像素预测任务。然而，使用高分辨率的特征映射会导致模型复杂性的高代价和无效的接受域。为了克服高分辨率特征的问题，传统方法探索多层次特征融合，即在较早层的低级特征与顶层的高级特征之间交换信息。低信息和高信息都是通过分层骨干网提取的，其中高层特征包含更多的语义线索，而低层特征包含更具体的模式。因此，将这些特征应用到训练分割模型中是必要的，设计一种更高效的多层次特征融合是至关重要的。现有方法通过使用自顶向下和自底向上的路径连接以及更低效的卷积层来平衡这些信息，从而产生更丰富的多尺度特征。在本工作中，我们提出了两个方面的贡献:(1)提出了一种简单而有效的多层特征重加权层，基于主干网的多个特征产生的通道重加权来增强深度高级特征;(2)提出了一种高效的融合块，以深度到空间的方式处理低分辨率特征，并将增强的多层特征组合在一起。这些设计使分割模型能够预测在高级特征映射上生成掩码的实例核。为了验证该方法的有效性，我们在具有挑战性的基准数据集MS-COCO上进行了实验。令人惊讶的是，我们的简单网络在准确率和推理速度上都优于基线。更具体地说，我们在GPU设备上以19.5 FPS实现了35.4%的APmask，成为最先进的实例分割方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Multi-level Feature Reweighting and Fusion for Instance Segmentation

Accurate instance segmentation requires high-resolution features for performing a dense pixel-wise prediction task. However, using high-resolution feature maps results in highly expensive model complexity and ineffective receptive fields. To overcome the problems of high-resolution features, conventional methods explore multi-level feature fusion that exchanges the information between low-level features at earlier layers and high-level features at top layers. Both low and high information is extracted by the hierarchical backbone network where high-level features contain more semantic cues and low-level features encompass more specific patterns. Thus, adopting these features to the training segmentation model is necessary, and designing a more efficient multi-level feature fusion is crucial. Existing methods balance such information by using top-down and bottom-up pathway connections with more inefficient convolution layers to produce richer multi-scale features. In this work, we contribute two folds: (1) a simple but effective multilevel feature reweighting layer is proposed to strengthen deep high-level features based on channel reweighting generated from multiple features of the backbone, and (2) an efficient fusion block is proposed to process low-resolution features in a depth-to-spatial manner and combine enhanced multi-level features together. These designs enable the segmentation models to predict instance kernels for mask generation on high-level feature maps. To verify the effectiveness of the proposed method, we conduct experiments on the challenging benchmark dataset MS-COCO. Surprisingly, our simple network outperforms the baseline in both accuracy and inference speed. More specifically, we achieve 35.4% APmask at 19.5 FPS on a GPU device, becoming a state-of-the-art instance segmentation method.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 IEEE 20th International Conference on Industrial Informatics (INDIN)

自引率

0.00%

发文量