{"title":"DiffMatter: Different frequency fusion for trimap-free image matting via edge detection","authors":"Anming Sun , Junjie Chang , Guilin Yao","doi":"10.1016/j.cviu.2025.104424","DOIUrl":null,"url":null,"abstract":"<div><div>Image matting extracts the foreground from the target image by predicting the alpha transparency of the foreground. Existing methods rely on constraints such as trimaps to distinguish the foreground from the background in the image, which, while improving accuracy, inevitably incurs significant costs. This paper proposes a trimap-free automatic matting method that highlights the foreground area through edge detection. To address the domain adaptation issues of edge information and the fine-grained features required for matting task, we designed a plug-and-play <strong>D</strong>ifferent <strong>F</strong>requency <strong>F</strong>usion module according to the paradigm of characteristic enhancement, feature fusion, and information integration to effectively combine high-frequency components with low-frequency counterparts and propose a matting model, DiffMatter. Specifically, we designed texture highlighting and semantic enhancement modules for high-frequency and low-frequency information during the characteristic enhancement phase. For feature fusion, we employed cross-fusion operations, and in the information integration phase, we integrated information across spatial and channel dimensions. Additionally, to compensate for the shortcoming of transformer in capturing local information, we construct an attention embedding module and propose a cross-aware module to utilize channel and spatial information, respectively, to enhance representational capability. Experimental results on the Composition-1k, Distinctions, 646, and real-world AIM-500 datasets demonstrate that our model outperforms competing methods, achieving a balance between performance and computational efficiency. Furthermore, our different frequency fusion module enhances several state-of-the-art matting models. The code will be publicly released.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"259 ","pages":"Article 104424"},"PeriodicalIF":4.3000,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Vision and Image Understanding","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S107731422500147X","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Image matting extracts the foreground from the target image by predicting the alpha transparency of the foreground. Existing methods rely on constraints such as trimaps to distinguish the foreground from the background in the image, which, while improving accuracy, inevitably incurs significant costs. This paper proposes a trimap-free automatic matting method that highlights the foreground area through edge detection. To address the domain adaptation issues of edge information and the fine-grained features required for matting task, we designed a plug-and-play Different Frequency Fusion module according to the paradigm of characteristic enhancement, feature fusion, and information integration to effectively combine high-frequency components with low-frequency counterparts and propose a matting model, DiffMatter. Specifically, we designed texture highlighting and semantic enhancement modules for high-frequency and low-frequency information during the characteristic enhancement phase. For feature fusion, we employed cross-fusion operations, and in the information integration phase, we integrated information across spatial and channel dimensions. Additionally, to compensate for the shortcoming of transformer in capturing local information, we construct an attention embedding module and propose a cross-aware module to utilize channel and spatial information, respectively, to enhance representational capability. Experimental results on the Composition-1k, Distinctions, 646, and real-world AIM-500 datasets demonstrate that our model outperforms competing methods, achieving a balance between performance and computational efficiency. Furthermore, our different frequency fusion module enhances several state-of-the-art matting models. The code will be publicly released.
期刊介绍:
The central focus of this journal is the computer analysis of pictorial information. Computer Vision and Image Understanding publishes papers covering all aspects of image analysis from the low-level, iconic processes of early vision to the high-level, symbolic processes of recognition and interpretation. A wide range of topics in the image understanding area is covered, including papers offering insights that differ from predominant views.
Research Areas Include:
• Theory
• Early vision
• Data structures and representations
• Shape
• Range
• Motion
• Matching and recognition
• Architecture and languages
• Vision systems