{"title":"Multi-level Feature Reweighting and Fusion for Instance Segmentation","authors":"Xuan-Thuy Vo, T. Tran, Duy-Linh Nguyen, K. Jo","doi":"10.1109/INDIN51773.2022.9976099","DOIUrl":null,"url":null,"abstract":"Accurate instance segmentation requires high-resolution features for performing a dense pixel-wise prediction task. However, using high-resolution feature maps results in highly expensive model complexity and ineffective receptive fields. To overcome the problems of high-resolution features, conventional methods explore multi-level feature fusion that exchanges the information between low-level features at earlier layers and high-level features at top layers. Both low and high information is extracted by the hierarchical backbone network where high-level features contain more semantic cues and low-level features encompass more specific patterns. Thus, adopting these features to the training segmentation model is necessary, and designing a more efficient multi-level feature fusion is crucial. Existing methods balance such information by using top-down and bottom-up pathway connections with more inefficient convolution layers to produce richer multi-scale features. In this work, we contribute two folds: (1) a simple but effective multilevel feature reweighting layer is proposed to strengthen deep high-level features based on channel reweighting generated from multiple features of the backbone, and (2) an efficient fusion block is proposed to process low-resolution features in a depth-to-spatial manner and combine enhanced multi-level features together. These designs enable the segmentation models to predict instance kernels for mask generation on high-level feature maps. To verify the effectiveness of the proposed method, we conduct experiments on the challenging benchmark dataset MS-COCO. Surprisingly, our simple network outperforms the baseline in both accuracy and inference speed. More specifically, we achieve 35.4% APmask at 19.5 FPS on a GPU device, becoming a state-of-the-art instance segmentation method.","PeriodicalId":359190,"journal":{"name":"2022 IEEE 20th International Conference on Industrial Informatics (INDIN)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 20th International Conference on Industrial Informatics (INDIN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INDIN51773.2022.9976099","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Accurate instance segmentation requires high-resolution features for performing a dense pixel-wise prediction task. However, using high-resolution feature maps results in highly expensive model complexity and ineffective receptive fields. To overcome the problems of high-resolution features, conventional methods explore multi-level feature fusion that exchanges the information between low-level features at earlier layers and high-level features at top layers. Both low and high information is extracted by the hierarchical backbone network where high-level features contain more semantic cues and low-level features encompass more specific patterns. Thus, adopting these features to the training segmentation model is necessary, and designing a more efficient multi-level feature fusion is crucial. Existing methods balance such information by using top-down and bottom-up pathway connections with more inefficient convolution layers to produce richer multi-scale features. In this work, we contribute two folds: (1) a simple but effective multilevel feature reweighting layer is proposed to strengthen deep high-level features based on channel reweighting generated from multiple features of the backbone, and (2) an efficient fusion block is proposed to process low-resolution features in a depth-to-spatial manner and combine enhanced multi-level features together. These designs enable the segmentation models to predict instance kernels for mask generation on high-level feature maps. To verify the effectiveness of the proposed method, we conduct experiments on the challenging benchmark dataset MS-COCO. Surprisingly, our simple network outperforms the baseline in both accuracy and inference speed. More specifically, we achieve 35.4% APmask at 19.5 FPS on a GPU device, becoming a state-of-the-art instance segmentation method.