Kun Dai , Fuyuan Qiu , Hongbo Gao , Tao Xie , Chuqing Cao , Ruifeng Li , Lijun Zhao , Ke Wang
{"title":"CTFS: A consolidated transformer framework for instance and semantic segmentation tasks","authors":"Kun Dai , Fuyuan Qiu , Hongbo Gao , Tao Xie , Chuqing Cao , Ruifeng Li , Lijun Zhao , Ke Wang","doi":"10.1016/j.neunet.2025.107745","DOIUrl":null,"url":null,"abstract":"<div><div>Instance segmentation and semantic segmentation are fundamental tasks that support many computer vision applications. Recently, researchers have investigated the feasibility of constructing a unified transformer framework and leveraging multi-task learning techniques to optimize instance and semantic segmentation tasks simultaneously. However, these methods learn the proportion and distribution of task-shared parameters concurrently during the training process, which inevitably presents a challenge to sufficiently optimize the network. In addition, conventional gradient rectification algorithms attempt to address gradient conflicts from an overall perspective, but they fall short of adequately resolving conflicts among individual elements within gradient vectors. In this study, we develop a consolidated Transformer framework CTFS to address these issues. To address the first issue, we introduce an affinity-guided sharing strategy (AGSS) that learns the proportion and distribution of task-shared parameters in two separate stages. This approach leverages the proportion of task-shared parameters as prior knowledge to guide the subsequent learning process, reducing the difficulty of network optimization. To address the second issue, we propose a fine-grained gradient rectification strategy (FGRS) that effectively mitigates gradient conflicts for each element in gradient vectors during backpropagation. Built upon the standard Swin Transformer without complicating its network architecture, CTFS achieves impressive performance on both the COCO dataset for the instance segmentation task and the ADE20K dataset for the semantic segmentation task.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"191 ","pages":"Article 107745"},"PeriodicalIF":6.0000,"publicationDate":"2025-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0893608025006252","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Instance segmentation and semantic segmentation are fundamental tasks that support many computer vision applications. Recently, researchers have investigated the feasibility of constructing a unified transformer framework and leveraging multi-task learning techniques to optimize instance and semantic segmentation tasks simultaneously. However, these methods learn the proportion and distribution of task-shared parameters concurrently during the training process, which inevitably presents a challenge to sufficiently optimize the network. In addition, conventional gradient rectification algorithms attempt to address gradient conflicts from an overall perspective, but they fall short of adequately resolving conflicts among individual elements within gradient vectors. In this study, we develop a consolidated Transformer framework CTFS to address these issues. To address the first issue, we introduce an affinity-guided sharing strategy (AGSS) that learns the proportion and distribution of task-shared parameters in two separate stages. This approach leverages the proportion of task-shared parameters as prior knowledge to guide the subsequent learning process, reducing the difficulty of network optimization. To address the second issue, we propose a fine-grained gradient rectification strategy (FGRS) that effectively mitigates gradient conflicts for each element in gradient vectors during backpropagation. Built upon the standard Swin Transformer without complicating its network architecture, CTFS achieves impressive performance on both the COCO dataset for the instance segmentation task and the ADE20K dataset for the semantic segmentation task.
期刊介绍:
Neural Networks is a platform that aims to foster an international community of scholars and practitioners interested in neural networks, deep learning, and other approaches to artificial intelligence and machine learning. Our journal invites submissions covering various aspects of neural networks research, from computational neuroscience and cognitive modeling to mathematical analyses and engineering applications. By providing a forum for interdisciplinary discussions between biology and technology, we aim to encourage the development of biologically-inspired artificial intelligence.