DiffMatter: Different frequency fusion for trimap-free image matting via edge detection

IF 3.5 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computer Vision and Image Understanding Pub Date : 2025-06-16 DOI:10.1016/j.cviu.2025.104424

Anming Sun , Junjie Chang , Guilin Yao

{"title":"DiffMatter: Different frequency fusion for trimap-free image matting via edge detection","authors":"Anming Sun , Junjie Chang , Guilin Yao","doi":"10.1016/j.cviu.2025.104424","DOIUrl":null,"url":null,"abstract":"<div><div>Image matting extracts the foreground from the target image by predicting the alpha transparency of the foreground. Existing methods rely on constraints such as trimaps to distinguish the foreground from the background in the image, which, while improving accuracy, inevitably incurs significant costs. This paper proposes a trimap-free automatic matting method that highlights the foreground area through edge detection. To address the domain adaptation issues of edge information and the fine-grained features required for matting task, we designed a plug-and-play <strong>D</strong>ifferent <strong>F</strong>requency <strong>F</strong>usion module according to the paradigm of characteristic enhancement, feature fusion, and information integration to effectively combine high-frequency components with low-frequency counterparts and propose a matting model, DiffMatter. Specifically, we designed texture highlighting and semantic enhancement modules for high-frequency and low-frequency information during the characteristic enhancement phase. For feature fusion, we employed cross-fusion operations, and in the information integration phase, we integrated information across spatial and channel dimensions. Additionally, to compensate for the shortcoming of transformer in capturing local information, we construct an attention embedding module and propose a cross-aware module to utilize channel and spatial information, respectively, to enhance representational capability. Experimental results on the Composition-1k, Distinctions, 646, and real-world AIM-500 datasets demonstrate that our model outperforms competing methods, achieving a balance between performance and computational efficiency. Furthermore, our different frequency fusion module enhances several state-of-the-art matting models. The code will be publicly released.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"259 ","pages":"Article 104424"},"PeriodicalIF":3.5000,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Vision and Image Understanding","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S107731422500147X","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Image matting extracts the foreground from the target image by predicting the alpha transparency of the foreground. Existing methods rely on constraints such as trimaps to distinguish the foreground from the background in the image, which, while improving accuracy, inevitably incurs significant costs. This paper proposes a trimap-free automatic matting method that highlights the foreground area through edge detection. To address the domain adaptation issues of edge information and the fine-grained features required for matting task, we designed a plug-and-play Different Frequency Fusion module according to the paradigm of characteristic enhancement, feature fusion, and information integration to effectively combine high-frequency components with low-frequency counterparts and propose a matting model, DiffMatter. Specifically, we designed texture highlighting and semantic enhancement modules for high-frequency and low-frequency information during the characteristic enhancement phase. For feature fusion, we employed cross-fusion operations, and in the information integration phase, we integrated information across spatial and channel dimensions. Additionally, to compensate for the shortcoming of transformer in capturing local information, we construct an attention embedding module and propose a cross-aware module to utilize channel and spatial information, respectively, to enhance representational capability. Experimental results on the Composition-1k, Distinctions, 646, and real-world AIM-500 datasets demonstrate that our model outperforms competing methods, achieving a balance between performance and computational efficiency. Furthermore, our different frequency fusion module enhances several state-of-the-art matting models. The code will be publicly released.

查看原文本刊更多论文

DiffMatter：通过边缘检测实现无trimap图像抠图的不同频率融合

图像抠图通过预测前景的alpha透明度，从目标图像中提取前景。现有的方法依靠诸如trimaps等约束条件来区分图像中的前景和背景，这种方法在提高精度的同时，不可避免地会产生巨大的成本。本文提出了一种通过边缘检测突出前景区域的无trimap自动抠图方法。为了解决边缘信息的域自适应问题和消光任务所需的细粒度特征，我们设计了一个即插即用的不同频率融合模块，根据特征增强、特征融合和信息集成的范式，有效地将高频分量与低频分量结合起来，并提出了一个消光模型DiffMatter。具体来说，在特征增强阶段，我们设计了高频和低频信息的纹理突出和语义增强模块。在特征融合方面，我们采用交叉融合操作，在信息整合阶段，我们跨空间和通道维度整合信息。此外，为了弥补变压器在捕获局部信息方面的不足，我们构建了一个注意力嵌入模块，并提出了一个交叉感知模块，分别利用信道和空间信息来增强表征能力。在Composition-1k、区分、646和真实的AIM-500数据集上的实验结果表明，我们的模型优于竞争方法，实现了性能和计算效率之间的平衡。此外，我们的不同频率融合模块增强了几个最先进的抠图模型。代码将被公开发布。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computer Vision and Image Understanding 工程技术-工程：电子与电气

CiteScore

7.80

自引率

4.40%

发文量

112

审稿时长

79 days

期刊介绍： The central focus of this journal is the computer analysis of pictorial information. Computer Vision and Image Understanding publishes papers covering all aspects of image analysis from the low-level, iconic processes of early vision to the high-level, symbolic processes of recognition and interpretation. A wide range of topics in the image understanding area is covered, including papers offering insights that differ from predominant views. Research Areas Include: • Theory • Early vision • Data structures and representations • Shape • Range • Motion • Matching and recognition • Architecture and languages • Vision systems