{"title":"Trinity Detector: Text-Assisted and Attention Mechanisms Based Spectral Fusion for Diffusion Generation Image Detection","authors":"Jiawei Song;Dengpan Ye;Yunming Zhang","doi":"10.1109/LSP.2024.3522851","DOIUrl":null,"url":null,"abstract":"Artificial Intelligence Generated Content (AIGC) techniques, represented by text-to-image generation, have led to a malicious use of deep forgeries, raising concerns about the trustworthiness of multimedia content. Experimental results demonstrate that traditional forgery detection methods perform poorly in adapting to diffusion model-generated scenarios, while existing diffusion-specific techniques lack robustness against post-processed images. In response, we propose the Trinity Detector, which integrates coarse-grained text features from a Contrastive Language-Image Pretraining (CLIP) encoder with fine-grained artifacts in the pixel domain to achieve semantic-level image detection, significantly enhancing model robustness. To enhance sensitivity to diffusion-generated image features, a Multi-spectral Channel Attention Fusion Unit (MCAF) is designed. It adaptively fuses multiple preset frequency bands, dynamically adjusting the weight of each band, and then integrates the fused frequency-domain information with the spatial co-occurrence of the two modalities. Extensive experiments validate that our Trinity Detector improves transfer detection performance across black-box datasets by an average of 14.3% compared to previous diffusion detection models and demonstrating superior performance on post-processed image datasets.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"501-505"},"PeriodicalIF":3.2000,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Signal Processing Letters","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10816560/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Artificial Intelligence Generated Content (AIGC) techniques, represented by text-to-image generation, have led to a malicious use of deep forgeries, raising concerns about the trustworthiness of multimedia content. Experimental results demonstrate that traditional forgery detection methods perform poorly in adapting to diffusion model-generated scenarios, while existing diffusion-specific techniques lack robustness against post-processed images. In response, we propose the Trinity Detector, which integrates coarse-grained text features from a Contrastive Language-Image Pretraining (CLIP) encoder with fine-grained artifacts in the pixel domain to achieve semantic-level image detection, significantly enhancing model robustness. To enhance sensitivity to diffusion-generated image features, a Multi-spectral Channel Attention Fusion Unit (MCAF) is designed. It adaptively fuses multiple preset frequency bands, dynamically adjusting the weight of each band, and then integrates the fused frequency-domain information with the spatial co-occurrence of the two modalities. Extensive experiments validate that our Trinity Detector improves transfer detection performance across black-box datasets by an average of 14.3% compared to previous diffusion detection models and demonstrating superior performance on post-processed image datasets.
期刊介绍:
The IEEE Signal Processing Letters is a monthly, archival publication designed to provide rapid dissemination of original, cutting-edge ideas and timely, significant contributions in signal, image, speech, language and audio processing. Papers published in the Letters can be presented within one year of their appearance in signal processing conferences such as ICASSP, GlobalSIP and ICIP, and also in several workshop organized by the Signal Processing Society.