用于人机协作优化的学习型图像编码

IF 4.8 1区计算机科学 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Broadcasting Pub Date : 2024-08-21 DOI:10.1109/TBC.2024.3443470

Jingbo He;Xiaohai He;Shuhua Xiong;Honggang Chen

{"title":"用于人机协作优化的学习型图像编码","authors":"Jingbo He;Xiaohai He;Shuhua Xiong;Honggang Chen","doi":"10.1109/TBC.2024.3443470","DOIUrl":null,"url":null,"abstract":"The exponential growth in the volume of image data has imposed immense pressure on transmission and storage systems, while simultaneously presenting opportunities for intelligent image analysis towards machine vision. Recent years, learned image coding approach have made remarkable advancements with impressive performance. The application of the learned image coding method in machine vision holds promising prospects for achieving human-machine collaboration. In this paper, we propose a learned image coding approach based on Transformer-CNN interaction structure for human-machine vision collaborative optimization, which can generate a single and compact bitstream for efficient representation in image compression. The bitstream can be directly decoded to generate a reconstructed image for human visual perception. In parallel, without the need for decoding and reconstructing the image, the bitstream can serve as input for machine vision tasks. This not only reduces computational costs on the decoding end but also enhances machine analysis efficiency. Experimental results demonstrate that our proposed learned image coding method achieves a single bitstream that concurrently considers image reconstruction and machine task analysis, ensuring high accuracy in machine tasks and superior quality in reconstructed images compared to state-of-the-art (SOTA) methods.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 1","pages":"203-216"},"PeriodicalIF":4.8000,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Learned Image Coding for Human-Machine Collaborative Optimization\",\"authors\":\"Jingbo He;Xiaohai He;Shuhua Xiong;Honggang Chen\",\"doi\":\"10.1109/TBC.2024.3443470\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The exponential growth in the volume of image data has imposed immense pressure on transmission and storage systems, while simultaneously presenting opportunities for intelligent image analysis towards machine vision. Recent years, learned image coding approach have made remarkable advancements with impressive performance. The application of the learned image coding method in machine vision holds promising prospects for achieving human-machine collaboration. In this paper, we propose a learned image coding approach based on Transformer-CNN interaction structure for human-machine vision collaborative optimization, which can generate a single and compact bitstream for efficient representation in image compression. The bitstream can be directly decoded to generate a reconstructed image for human visual perception. In parallel, without the need for decoding and reconstructing the image, the bitstream can serve as input for machine vision tasks. This not only reduces computational costs on the decoding end but also enhances machine analysis efficiency. Experimental results demonstrate that our proposed learned image coding method achieves a single bitstream that concurrently considers image reconstruction and machine task analysis, ensuring high accuracy in machine tasks and superior quality in reconstructed images compared to state-of-the-art (SOTA) methods.\",\"PeriodicalId\":13159,\"journal\":{\"name\":\"IEEE Transactions on Broadcasting\",\"volume\":\"71 1\",\"pages\":\"203-216\"},\"PeriodicalIF\":4.8000,\"publicationDate\":\"2024-08-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Broadcasting\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10643150/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Broadcasting","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10643150/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

图像数据量的指数级增长给传输和存储系统带来了巨大的压力，同时也为机器视觉的智能图像分析提供了机会。近年来，学习图像编码方法取得了令人瞩目的进展，取得了令人印象深刻的成绩。将学习图像编码方法应用于机器视觉，为实现人机协作提供了广阔的前景。在本文中，我们提出了一种基于Transformer-CNN交互结构的学习图像编码方法，用于人机视觉协同优化，该方法可以生成单个紧凑的比特流，用于图像压缩中的高效表示。该码流可以直接解码，生成用于人眼视觉感知的重构图像。同时，不需要解码和重建图像，比特流可以作为机器视觉任务的输入。这不仅减少了解码端的计算成本，而且提高了机器分析的效率。实验结果表明，我们提出的学习图像编码方法实现了单个比特流，同时考虑了图像重建和机器任务分析，与最先进的（SOTA）方法相比，确保了机器任务的高精度和重建图像的质量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Learned Image Coding for Human-Machine Collaborative Optimization

The exponential growth in the volume of image data has imposed immense pressure on transmission and storage systems, while simultaneously presenting opportunities for intelligent image analysis towards machine vision. Recent years, learned image coding approach have made remarkable advancements with impressive performance. The application of the learned image coding method in machine vision holds promising prospects for achieving human-machine collaboration. In this paper, we propose a learned image coding approach based on Transformer-CNN interaction structure for human-machine vision collaborative optimization, which can generate a single and compact bitstream for efficient representation in image compression. The bitstream can be directly decoded to generate a reconstructed image for human visual perception. In parallel, without the need for decoding and reconstructing the image, the bitstream can serve as input for machine vision tasks. This not only reduces computational costs on the decoding end but also enhances machine analysis efficiency. Experimental results demonstrate that our proposed learned image coding method achieves a single bitstream that concurrently considers image reconstruction and machine task analysis, ensuring high accuracy in machine tasks and superior quality in reconstructed images compared to state-of-the-art (SOTA) methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Broadcasting 工程技术-电信学

CiteScore

9.40

自引率

31.10%

发文量

审稿时长

6-12 weeks

期刊介绍： The Society’s Field of Interest is “Devices, equipment, techniques and systems related to broadcast technology, including the production, distribution, transmission, and propagation aspects.” In addition to this formal FOI statement, which is used to provide guidance to the Publications Committee in the selection of content, the AdCom has further resolved that “broadcast systems includes all aspects of transmission, propagation, and reception.”