{"title":"Gateinst:在变压器解码器中使用多尺度门控增强查询进行实例分割","authors":"Chih-Wei Lin, Ye Lin, Shangtai Zhou, Lirong Zhu","doi":"10.1007/s00530-024-01438-1","DOIUrl":null,"url":null,"abstract":"<p>Recently, a popular query-based end-to-end framework has been used for instance segmentation. However, queries update based on individual layers or scales of feature maps at each stage of Transformer decoding, which makes queries unable to gather sufficient multi-scale feature information. Therefore, querying these features may result in inconsistent information due to disparities among feature maps and leading to erroneous updates. This study proposes a new network called GateInst, which employs a dual-path auto-select mechanism based on gate structures to overcome these issues. Firstly, we design a block-wise multi-scale feature fusion module that combines features of different scales while maintaining low computational cost. Secondly, we introduce the gated-enhanced queries Transformer decoder that utilizes a gating mechanism to filter and merge the queries generated at different stages to compensate for the inaccuracies in updating queries. GateInst addresses the issue of insufficient feature information and compensates for the problem of cumulative errors in queries. Experiments have shown that GateInst achieves significant gains of 8.4 <i>AP</i>, 5.5 <span>\\(AP_{50}\\)</span> over Mask2Former on the self-collected Tree Species Instance Dataset and performs well compared to non-Mask2Former-like and Mask2Former-like networks on self-collected and public COCO datasets, with only a tiny amount of additional computational cost and fast convergence. Code and models are available at https://github.com/FAFU-IMLab/GateInst.</p>","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"13 1","pages":""},"PeriodicalIF":3.5000,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Gateinst: instance segmentation with multi-scale gated-enhanced queries in transformer decoder\",\"authors\":\"Chih-Wei Lin, Ye Lin, Shangtai Zhou, Lirong Zhu\",\"doi\":\"10.1007/s00530-024-01438-1\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Recently, a popular query-based end-to-end framework has been used for instance segmentation. However, queries update based on individual layers or scales of feature maps at each stage of Transformer decoding, which makes queries unable to gather sufficient multi-scale feature information. Therefore, querying these features may result in inconsistent information due to disparities among feature maps and leading to erroneous updates. This study proposes a new network called GateInst, which employs a dual-path auto-select mechanism based on gate structures to overcome these issues. Firstly, we design a block-wise multi-scale feature fusion module that combines features of different scales while maintaining low computational cost. Secondly, we introduce the gated-enhanced queries Transformer decoder that utilizes a gating mechanism to filter and merge the queries generated at different stages to compensate for the inaccuracies in updating queries. GateInst addresses the issue of insufficient feature information and compensates for the problem of cumulative errors in queries. Experiments have shown that GateInst achieves significant gains of 8.4 <i>AP</i>, 5.5 <span>\\\\(AP_{50}\\\\)</span> over Mask2Former on the self-collected Tree Species Instance Dataset and performs well compared to non-Mask2Former-like and Mask2Former-like networks on self-collected and public COCO datasets, with only a tiny amount of additional computational cost and fast convergence. Code and models are available at https://github.com/FAFU-IMLab/GateInst.</p>\",\"PeriodicalId\":51138,\"journal\":{\"name\":\"Multimedia Systems\",\"volume\":\"13 1\",\"pages\":\"\"},\"PeriodicalIF\":3.5000,\"publicationDate\":\"2024-08-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Multimedia Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s00530-024-01438-1\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Multimedia Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s00530-024-01438-1","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Gateinst: instance segmentation with multi-scale gated-enhanced queries in transformer decoder
Recently, a popular query-based end-to-end framework has been used for instance segmentation. However, queries update based on individual layers or scales of feature maps at each stage of Transformer decoding, which makes queries unable to gather sufficient multi-scale feature information. Therefore, querying these features may result in inconsistent information due to disparities among feature maps and leading to erroneous updates. This study proposes a new network called GateInst, which employs a dual-path auto-select mechanism based on gate structures to overcome these issues. Firstly, we design a block-wise multi-scale feature fusion module that combines features of different scales while maintaining low computational cost. Secondly, we introduce the gated-enhanced queries Transformer decoder that utilizes a gating mechanism to filter and merge the queries generated at different stages to compensate for the inaccuracies in updating queries. GateInst addresses the issue of insufficient feature information and compensates for the problem of cumulative errors in queries. Experiments have shown that GateInst achieves significant gains of 8.4 AP, 5.5 \(AP_{50}\) over Mask2Former on the self-collected Tree Species Instance Dataset and performs well compared to non-Mask2Former-like and Mask2Former-like networks on self-collected and public COCO datasets, with only a tiny amount of additional computational cost and fast convergence. Code and models are available at https://github.com/FAFU-IMLab/GateInst.
期刊介绍:
This journal details innovative research ideas, emerging technologies, state-of-the-art methods and tools in all aspects of multimedia computing, communication, storage, and applications. It features theoretical, experimental, and survey articles.