Human–Machine Collaborative Image Compression Method Based on Implicit Neural Representations

IF 3.7 2区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Journal on Emerging and Selected Topics in Circuits and Systems Pub Date : 2024-04-09 DOI:10.1109/JETCAS.2024.3386639

Huanyang Li;Xinfeng Zhang

{"title":"Human–Machine Collaborative Image Compression Method Based on Implicit Neural Representations","authors":"Huanyang Li;Xinfeng Zhang","doi":"10.1109/JETCAS.2024.3386639","DOIUrl":null,"url":null,"abstract":"With the explosive increase in the volume of images intended for analysis by AI, image coding for machine have been proposed to transmit information in a machine-interpretable format, thereby enhancing image compression efficiency. However, such efficient coding schemes often lead to issues like loss of image details and features, and unclear semantic information due to high data compression ratio, making them less suitable for human vision domains. Thus, it is a critical problem to balance image visual quality and machine vision accuracy at a given compression ratio. To address these issues, we introduce a human-machine collaborative image coding framework based on Implicit Neural Representations (INR), which effectively reduces the transmitted information for machine vision tasks at the decoding side while maintaining high-efficiency image compression for human vision against INR compression framework. To enhance the model’s perception of images for machine vision, we design a semantic embedding enhancement module to assist in understanding image semantics. Specifically, we employ the Swin Transformer model to initialize image features, ensuring that the embedding of the compression model are effectively applicable to downstream visual tasks. Extensive experimental results demonstrate that our method significantly outperforms other image compression methods in classification tasks while ensuring image compression efficiency.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"14 2","pages":"198-208"},"PeriodicalIF":3.7000,"publicationDate":"2024-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10495030/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

With the explosive increase in the volume of images intended for analysis by AI, image coding for machine have been proposed to transmit information in a machine-interpretable format, thereby enhancing image compression efficiency. However, such efficient coding schemes often lead to issues like loss of image details and features, and unclear semantic information due to high data compression ratio, making them less suitable for human vision domains. Thus, it is a critical problem to balance image visual quality and machine vision accuracy at a given compression ratio. To address these issues, we introduce a human-machine collaborative image coding framework based on Implicit Neural Representations (INR), which effectively reduces the transmitted information for machine vision tasks at the decoding side while maintaining high-efficiency image compression for human vision against INR compression framework. To enhance the model’s perception of images for machine vision, we design a semantic embedding enhancement module to assist in understanding image semantics. Specifically, we employ the Swin Transformer model to initialize image features, ensuring that the embedding of the compression model are effectively applicable to downstream visual tasks. Extensive experimental results demonstrate that our method significantly outperforms other image compression methods in classification tasks while ensuring image compression efficiency.

查看原文本刊更多论文

基于隐式神经表征的人机协作图像压缩方法

随着用于人工智能分析的图像数量呈爆炸式增长，人们提出了机器图像编码方案，以机器可解读的格式传输信息，从而提高图像压缩效率。然而，这种高效的编码方案往往会导致图像细节和特征的丢失，以及由于高数据压缩比导致语义信息不清晰等问题，使其不太适合人类视觉领域。因此，如何在给定的压缩比下平衡图像视觉质量和机器视觉精度是一个关键问题。为了解决这些问题，我们引入了基于隐式神经表征（INR）的人机协作图像编码框架，该框架在解码端有效减少了机器视觉任务的传输信息，同时在 INR 压缩框架下保持了人类视觉的高效图像压缩。为了增强机器视觉模型对图像的感知，我们设计了一个语义嵌入增强模块，以帮助理解图像语义。具体来说，我们采用 Swin Transformer 模型来初始化图像特征，确保压缩模型的嵌入有效地适用于下游视觉任务。广泛的实验结果表明，我们的方法在分类任务中明显优于其他图像压缩方法，同时确保了图像压缩效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Journal on Emerging and Selected Topics in Circuits and Systems ENGINEERING, ELECTRICAL & ELECTRONIC-

CiteScore

8.50

自引率

2.20%

发文量

期刊介绍： The IEEE Journal on Emerging and Selected Topics in Circuits and Systems is published quarterly and solicits, with particular emphasis on emerging areas, special issues on topics that cover the entire scope of the IEEE Circuits and Systems (CAS) Society, namely the theory, analysis, design, tools, and implementation of circuits and systems, spanning their theoretical foundations, applications, and architectures for signal and information processing.