CTFS: A consolidated transformer framework for instance and semantic segmentation tasks

IF 6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Networks Pub Date : 2025-06-24 DOI:10.1016/j.neunet.2025.107745

Kun Dai , Fuyuan Qiu , Hongbo Gao , Tao Xie , Chuqing Cao , Ruifeng Li , Lijun Zhao , Ke Wang

{"title":"CTFS: A consolidated transformer framework for instance and semantic segmentation tasks","authors":"Kun Dai , Fuyuan Qiu , Hongbo Gao , Tao Xie , Chuqing Cao , Ruifeng Li , Lijun Zhao , Ke Wang","doi":"10.1016/j.neunet.2025.107745","DOIUrl":null,"url":null,"abstract":"<div><div>Instance segmentation and semantic segmentation are fundamental tasks that support many computer vision applications. Recently, researchers have investigated the feasibility of constructing a unified transformer framework and leveraging multi-task learning techniques to optimize instance and semantic segmentation tasks simultaneously. However, these methods learn the proportion and distribution of task-shared parameters concurrently during the training process, which inevitably presents a challenge to sufficiently optimize the network. In addition, conventional gradient rectification algorithms attempt to address gradient conflicts from an overall perspective, but they fall short of adequately resolving conflicts among individual elements within gradient vectors. In this study, we develop a consolidated Transformer framework CTFS to address these issues. To address the first issue, we introduce an affinity-guided sharing strategy (AGSS) that learns the proportion and distribution of task-shared parameters in two separate stages. This approach leverages the proportion of task-shared parameters as prior knowledge to guide the subsequent learning process, reducing the difficulty of network optimization. To address the second issue, we propose a fine-grained gradient rectification strategy (FGRS) that effectively mitigates gradient conflicts for each element in gradient vectors during backpropagation. Built upon the standard Swin Transformer without complicating its network architecture, CTFS achieves impressive performance on both the COCO dataset for the instance segmentation task and the ADE20K dataset for the semantic segmentation task.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"191 ","pages":"Article 107745"},"PeriodicalIF":6.0000,"publicationDate":"2025-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0893608025006252","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Instance segmentation and semantic segmentation are fundamental tasks that support many computer vision applications. Recently, researchers have investigated the feasibility of constructing a unified transformer framework and leveraging multi-task learning techniques to optimize instance and semantic segmentation tasks simultaneously. However, these methods learn the proportion and distribution of task-shared parameters concurrently during the training process, which inevitably presents a challenge to sufficiently optimize the network. In addition, conventional gradient rectification algorithms attempt to address gradient conflicts from an overall perspective, but they fall short of adequately resolving conflicts among individual elements within gradient vectors. In this study, we develop a consolidated Transformer framework CTFS to address these issues. To address the first issue, we introduce an affinity-guided sharing strategy (AGSS) that learns the proportion and distribution of task-shared parameters in two separate stages. This approach leverages the proportion of task-shared parameters as prior knowledge to guide the subsequent learning process, reducing the difficulty of network optimization. To address the second issue, we propose a fine-grained gradient rectification strategy (FGRS) that effectively mitigates gradient conflicts for each element in gradient vectors during backpropagation. Built upon the standard Swin Transformer without complicating its network architecture, CTFS achieves impressive performance on both the COCO dataset for the instance segmentation task and the ADE20K dataset for the semantic segmentation task.

查看原文本刊更多论文

CTFS：用于实例和语义分割任务的统一转换器框架

实例分割和语义分割是支持许多计算机视觉应用的基本任务。最近，研究人员研究了构建统一的转换器框架并利用多任务学习技术同时优化实例和语义分割任务的可行性。然而，这些方法在训练过程中同时学习任务共享参数的比例和分布，这不可避免地给网络的充分优化带来了挑战。此外，传统的梯度校正算法试图从整体的角度解决梯度冲突，但它们不能充分解决梯度向量内单个元素之间的冲突。在本研究中，我们开发了一个统一的Transformer框架CTFS来解决这些问题。为了解决第一个问题，我们引入了一种亲和引导共享策略（AGSS），该策略在两个不同的阶段学习任务共享参数的比例和分布。该方法利用任务共享参数的比例作为先验知识来指导后续的学习过程，降低了网络优化的难度。为了解决第二个问题，我们提出了一种细粒度梯度校正策略（FGRS），该策略有效地缓解了梯度向量中每个元素在反向传播过程中的梯度冲突。CTFS建立在标准的Swin Transformer之上，没有使其网络架构变得复杂，它在用于实例分割任务的COCO数据集和用于语义分割任务的ADE20K数据集上都取得了令人印象深刻的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Neural Networks 工程技术-计算机：人工智能

CiteScore

13.90

自引率

7.70%

发文量

425

审稿时长

67 days

期刊介绍： Neural Networks is a platform that aims to foster an international community of scholars and practitioners interested in neural networks, deep learning, and other approaches to artificial intelligence and machine learning. Our journal invites submissions covering various aspects of neural networks research, from computational neuroscience and cognitive modeling to mathematical analyses and engineering applications. By providing a forum for interdisciplinary discussions between biology and technology, we aim to encourage the development of biologically-inspired artificial intelligence.