{"title":"面向可扩展人机图像压缩的任务适应可学习嵌入式量化","authors":"Shaohui Li;Shuoyu Ma;Wenrui Dai;Nuowen Kan;Fan Cheng;Chenglin Li;Junni Zou;Hongkai Xiong","doi":"10.1109/TCSVT.2025.3525664","DOIUrl":null,"url":null,"abstract":"Image compression for both human and machine vision has become prevailing to accommodate to rising demands for machine-machine and human-machine communications. Scalable human-machine image compression is recently emerging as an efficient alternative to simultaneously achieve high accuracy for machine vision in the base layer and obtain high-fidelity reconstruction for human vision in the enhancement layer. However, existing methods achieve scalable coding with heuristic mechanisms, which cannot fully exploit the inter-layer correlations and evidently sacrifice rate-distortion performance. In this paper, we propose task-adapted learnable embedded quantization to address this problem in an analytically optimized fashion. We first reveal the relationship between the latent representations for machine and human vision and demonstrate that optimal representation for machine vision can be approximated with post-training optimization on the learned representation for human vision. On such basis, we propose task-adapted learnable embedded quantization that leverages learnable step predictor to adaptively determine the optimal quantization step for diverse machine vision tasks such that inter-layer correlations between representations for human and machine vision are sufficiently exploited using embedded quantization. Furthermore, we develop a human-machine scalable coding framework by incorporating the proposed embedded quantization into pre-trained learned image compression models. Experimental results demonstrate that the proposed framework achieves state-of-the-art performance on machine vision tasks like object detection, instance segmentation, and panoptic segmentation with negligible loss in rate-distortion performance for human vision.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 5","pages":"4768-4783"},"PeriodicalIF":8.3000,"publicationDate":"2025-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Task–Adapted Learnable Embedded Quantization for Scalable Human-Machine Image Compression\",\"authors\":\"Shaohui Li;Shuoyu Ma;Wenrui Dai;Nuowen Kan;Fan Cheng;Chenglin Li;Junni Zou;Hongkai Xiong\",\"doi\":\"10.1109/TCSVT.2025.3525664\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Image compression for both human and machine vision has become prevailing to accommodate to rising demands for machine-machine and human-machine communications. Scalable human-machine image compression is recently emerging as an efficient alternative to simultaneously achieve high accuracy for machine vision in the base layer and obtain high-fidelity reconstruction for human vision in the enhancement layer. However, existing methods achieve scalable coding with heuristic mechanisms, which cannot fully exploit the inter-layer correlations and evidently sacrifice rate-distortion performance. In this paper, we propose task-adapted learnable embedded quantization to address this problem in an analytically optimized fashion. We first reveal the relationship between the latent representations for machine and human vision and demonstrate that optimal representation for machine vision can be approximated with post-training optimization on the learned representation for human vision. On such basis, we propose task-adapted learnable embedded quantization that leverages learnable step predictor to adaptively determine the optimal quantization step for diverse machine vision tasks such that inter-layer correlations between representations for human and machine vision are sufficiently exploited using embedded quantization. Furthermore, we develop a human-machine scalable coding framework by incorporating the proposed embedded quantization into pre-trained learned image compression models. Experimental results demonstrate that the proposed framework achieves state-of-the-art performance on machine vision tasks like object detection, instance segmentation, and panoptic segmentation with negligible loss in rate-distortion performance for human vision.\",\"PeriodicalId\":13082,\"journal\":{\"name\":\"IEEE Transactions on Circuits and Systems for Video Technology\",\"volume\":\"35 5\",\"pages\":\"4768-4783\"},\"PeriodicalIF\":8.3000,\"publicationDate\":\"2025-01-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Circuits and Systems for Video Technology\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10824850/\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10824850/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
Task–Adapted Learnable Embedded Quantization for Scalable Human-Machine Image Compression
Image compression for both human and machine vision has become prevailing to accommodate to rising demands for machine-machine and human-machine communications. Scalable human-machine image compression is recently emerging as an efficient alternative to simultaneously achieve high accuracy for machine vision in the base layer and obtain high-fidelity reconstruction for human vision in the enhancement layer. However, existing methods achieve scalable coding with heuristic mechanisms, which cannot fully exploit the inter-layer correlations and evidently sacrifice rate-distortion performance. In this paper, we propose task-adapted learnable embedded quantization to address this problem in an analytically optimized fashion. We first reveal the relationship between the latent representations for machine and human vision and demonstrate that optimal representation for machine vision can be approximated with post-training optimization on the learned representation for human vision. On such basis, we propose task-adapted learnable embedded quantization that leverages learnable step predictor to adaptively determine the optimal quantization step for diverse machine vision tasks such that inter-layer correlations between representations for human and machine vision are sufficiently exploited using embedded quantization. Furthermore, we develop a human-machine scalable coding framework by incorporating the proposed embedded quantization into pre-trained learned image compression models. Experimental results demonstrate that the proposed framework achieves state-of-the-art performance on machine vision tasks like object detection, instance segmentation, and panoptic segmentation with negligible loss in rate-distortion performance for human vision.
期刊介绍:
The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.