Jie Hu;Liujuan Cao;Xiaofeng Jin;Shengchuan Zhang;Rongrong Ji
{"title":"具有效率的通用图像分割","authors":"Jie Hu;Liujuan Cao;Xiaofeng Jin;Shengchuan Zhang;Rongrong Ji","doi":"10.1109/TPAMI.2025.3576857","DOIUrl":null,"url":null,"abstract":"In this paper, we present UISE, a unified image segmentation framework that achieves efficient performance across various segmentation tasks, eliminating the need for multiple specialized pipelines. UISE employs dynamic convolutions between universal segmentation kernels and image feature maps, enabling a single pipeline for different tasks such as panoptic, instance, semantic, and video instance segmentation. To address computational requirements, we introduce a feature pyramid aggregator for image feature extraction and a separable dynamic decoder for generating segmentation kernels. The aggregator re-parameterizes interpolation-first modules in a convolution-first manner, resulting in a significant acceleration of the pipeline without incurring additional costs. The decoder incorporates multi-head cross-attention through separable dynamic convolution, enhancing both efficiency and accuracy. Extensive experiments are conducted to validate UISE’s performance across different segmentation tasks. To the best of our knowledge, UISE is the first universal segmentation framework that delivers competitive performance in terms of both speed and accuracy when compared to current state-of-the-art models.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 10","pages":"8550-8562"},"PeriodicalIF":18.6000,"publicationDate":"2025-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Universal Image Segmentation With Efficiency\",\"authors\":\"Jie Hu;Liujuan Cao;Xiaofeng Jin;Shengchuan Zhang;Rongrong Ji\",\"doi\":\"10.1109/TPAMI.2025.3576857\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we present UISE, a unified image segmentation framework that achieves efficient performance across various segmentation tasks, eliminating the need for multiple specialized pipelines. UISE employs dynamic convolutions between universal segmentation kernels and image feature maps, enabling a single pipeline for different tasks such as panoptic, instance, semantic, and video instance segmentation. To address computational requirements, we introduce a feature pyramid aggregator for image feature extraction and a separable dynamic decoder for generating segmentation kernels. The aggregator re-parameterizes interpolation-first modules in a convolution-first manner, resulting in a significant acceleration of the pipeline without incurring additional costs. The decoder incorporates multi-head cross-attention through separable dynamic convolution, enhancing both efficiency and accuracy. Extensive experiments are conducted to validate UISE’s performance across different segmentation tasks. To the best of our knowledge, UISE is the first universal segmentation framework that delivers competitive performance in terms of both speed and accuracy when compared to current state-of-the-art models.\",\"PeriodicalId\":94034,\"journal\":{\"name\":\"IEEE transactions on pattern analysis and machine intelligence\",\"volume\":\"47 10\",\"pages\":\"8550-8562\"},\"PeriodicalIF\":18.6000,\"publicationDate\":\"2025-06-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on pattern analysis and machine intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11027444/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11027444/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
In this paper, we present UISE, a unified image segmentation framework that achieves efficient performance across various segmentation tasks, eliminating the need for multiple specialized pipelines. UISE employs dynamic convolutions between universal segmentation kernels and image feature maps, enabling a single pipeline for different tasks such as panoptic, instance, semantic, and video instance segmentation. To address computational requirements, we introduce a feature pyramid aggregator for image feature extraction and a separable dynamic decoder for generating segmentation kernels. The aggregator re-parameterizes interpolation-first modules in a convolution-first manner, resulting in a significant acceleration of the pipeline without incurring additional costs. The decoder incorporates multi-head cross-attention through separable dynamic convolution, enhancing both efficiency and accuracy. Extensive experiments are conducted to validate UISE’s performance across different segmentation tasks. To the best of our knowledge, UISE is the first universal segmentation framework that delivers competitive performance in terms of both speed and accuracy when compared to current state-of-the-art models.