PKU-AIGI-500K: A Neural Compression Benchmark and Model for AI-Generated Images

IF 3.7 2区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Journal on Emerging and Selected Topics in Circuits and Systems Pub Date : 2024-04-05 DOI:10.1109/JETCAS.2024.3385629

Xunxu Duan;Siwei Ma;Hongbin Liu;Chuanmin Jia

{"title":"PKU-AIGI-500K: A Neural Compression Benchmark and Model for AI-Generated Images","authors":"Xunxu Duan;Siwei Ma;Hongbin Liu;Chuanmin Jia","doi":"10.1109/JETCAS.2024.3385629","DOIUrl":null,"url":null,"abstract":"In recent years, artificial intelligence-generated content (AIGC) enabled by foundation models has received increasing attention and is undergoing remarkable development. Text prompts can be elegantly translated/converted into high-quality, photo-realistic images. This remarkable feature, however, has introduced extremely high bandwidth requirements for compressing and transmitting the vast number of AI-generated images (AIGI) for such AIGC services. Despite this challenge, research on compression methods for AIGI is conspicuously lacking but undeniably necessary. This research addresses this critical gap by introducing the pioneering AIGI dataset, PKU-AIGI-500K, encompassing over 105k+ diverse prompts and 528k+ images derived from five major foundation models. Through this dataset, we delve into exploring and analyzing the essential characteristics of AIGC images and empirically prove that existing data-driven lossy compression methods achieve sub-optimal or less efficient rate-distortion performance without fine-tuning, primarily due to a domain shift between AIGIs and natural images. We comprehensively benchmark the rate-distortion performance and runtime complexity analysis of conventional and learned image coding solutions that are openly available, uncovering new insights for emerging studies in AIGI compression. Moreover, to harness the full potential of redundant information in AIGI and its corresponding text, we propose an AIGI compression model (Cross-Attention Transformer Codec, CATC) trained on this dataset as a strong baseline. Subsequent experimental results demonstrate that our proposed model achieves up to 30.09% bitrate reduction compared to the state-of-the-art (SOTA) H.266/VVC codec and outperforms the SOTA learned codec, paving the way for future research in AIGI compression.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"14 2","pages":"172-184"},"PeriodicalIF":3.7000,"publicationDate":"2024-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10493034/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

In recent years, artificial intelligence-generated content (AIGC) enabled by foundation models has received increasing attention and is undergoing remarkable development. Text prompts can be elegantly translated/converted into high-quality, photo-realistic images. This remarkable feature, however, has introduced extremely high bandwidth requirements for compressing and transmitting the vast number of AI-generated images (AIGI) for such AIGC services. Despite this challenge, research on compression methods for AIGI is conspicuously lacking but undeniably necessary. This research addresses this critical gap by introducing the pioneering AIGI dataset, PKU-AIGI-500K, encompassing over 105k+ diverse prompts and 528k+ images derived from five major foundation models. Through this dataset, we delve into exploring and analyzing the essential characteristics of AIGC images and empirically prove that existing data-driven lossy compression methods achieve sub-optimal or less efficient rate-distortion performance without fine-tuning, primarily due to a domain shift between AIGIs and natural images. We comprehensively benchmark the rate-distortion performance and runtime complexity analysis of conventional and learned image coding solutions that are openly available, uncovering new insights for emerging studies in AIGI compression. Moreover, to harness the full potential of redundant information in AIGI and its corresponding text, we propose an AIGI compression model (Cross-Attention Transformer Codec, CATC) trained on this dataset as a strong baseline. Subsequent experimental results demonstrate that our proposed model achieves up to 30.09% bitrate reduction compared to the state-of-the-art (SOTA) H.266/VVC codec and outperforms the SOTA learned codec, paving the way for future research in AIGI compression.

查看原文本刊更多论文

PKU-AIGI-500K：人工智能生成图像的神经压缩基准和模型

近年来，由基础模型支持的人工智能生成内容（AIGC）越来越受到关注，并正在取得显著发展。文本提示可以优雅地翻译/转换成高质量、逼真的图片。然而，这一显著特点为此类 AIGC 服务压缩和传输大量人工智能生成的图像（AIGI）带来了极高的带宽要求。尽管存在这一挑战，但针对 AIGI 压缩方法的研究明显不足，但不可否认的是，这种研究是必要的。本研究通过引入开创性的 AIGI 数据集 PKU-AIGI-500K，填补了这一关键空白，该数据集包含来自五个主要基础模型的 105k+ 多种提示和 528k+ 多张图像。通过该数据集，我们深入探索和分析了 AIGC 图像的基本特征，并通过实证证明，现有的数据驱动有损压缩方法在不进行微调的情况下，可以获得次优或效率较低的速率-失真性能，这主要是由于 AIGI 与自然图像之间的领域偏移造成的。我们对公开的传统图像编码解决方案和学习图像编码解决方案的速率失真性能和运行时复杂性分析进行了全面的基准测试，为 AIGI 压缩领域的新兴研究揭示了新的见解。此外，为了充分利用 AIGI 及其相应文本中冗余信息的潜力，我们提出了一个 AIGI 压缩模型（Cross-Attention Transformer Codec，CATC），并以此数据集为基础进行了训练。随后的实验结果表明，与最先进的（SOTA）H.266/VVC 编解码器相比，我们提出的模型实现了高达 30.09% 的比特率缩减，并且优于 SOTA 学习的编解码器，为未来的 AIGI 压缩研究铺平了道路。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Journal on Emerging and Selected Topics in Circuits and Systems ENGINEERING, ELECTRICAL & ELECTRONIC-

CiteScore

8.50

自引率

2.20%

发文量

期刊介绍： The IEEE Journal on Emerging and Selected Topics in Circuits and Systems is published quarterly and solicits, with particular emphasis on emerging areas, special issues on topics that cover the entire scope of the IEEE Circuits and Systems (CAS) Society, namely the theory, analysis, design, tools, and implementation of circuits and systems, spanning their theoretical foundations, applications, and architectures for signal and information processing.