{"title":"利用面向频率的变换进行端到端优化图像压缩","authors":"Yuefeng Zhang, Kai Lin","doi":"10.1007/s00138-023-01507-x","DOIUrl":null,"url":null,"abstract":"<p>Image compression constitutes a significant challenge amid the era of information explosion. Recent studies employing deep learning methods have demonstrated the superior performance of learning-based image compression methods over traditional codecs. However, an inherent challenge associated with these methods lies in their lack of interpretability. Following an analysis of the varying degrees of compression degradation across different frequency bands, we propose the end-to-end optimized image compression model facilitated by the frequency-oriented transform. The proposed end-to-end image compression model consists of four components: spatial sampling, frequency-oriented transform, entropy estimation, and frequency-aware fusion. The frequency-oriented transform separates the original image signal into distinct frequency bands, aligning with the human-interpretable concept. Leveraging the non-overlapping hypothesis, the model enables scalable coding through the selective transmission of arbitrary frequency components. Extensive experiments are conducted to demonstrate that our model outperforms all traditional codecs including next-generation standard H.266/VVC on MS-SSIM metric. Moreover, visual analysis tasks (i.e., object detection and semantic segmentation) are conducted to verify the proposed compression method that could preserve semantic fidelity besides signal-level precision.</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"33 1","pages":""},"PeriodicalIF":2.4000,"publicationDate":"2024-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"End-to-end optimized image compression with the frequency-oriented transform\",\"authors\":\"Yuefeng Zhang, Kai Lin\",\"doi\":\"10.1007/s00138-023-01507-x\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Image compression constitutes a significant challenge amid the era of information explosion. Recent studies employing deep learning methods have demonstrated the superior performance of learning-based image compression methods over traditional codecs. However, an inherent challenge associated with these methods lies in their lack of interpretability. Following an analysis of the varying degrees of compression degradation across different frequency bands, we propose the end-to-end optimized image compression model facilitated by the frequency-oriented transform. The proposed end-to-end image compression model consists of four components: spatial sampling, frequency-oriented transform, entropy estimation, and frequency-aware fusion. The frequency-oriented transform separates the original image signal into distinct frequency bands, aligning with the human-interpretable concept. Leveraging the non-overlapping hypothesis, the model enables scalable coding through the selective transmission of arbitrary frequency components. Extensive experiments are conducted to demonstrate that our model outperforms all traditional codecs including next-generation standard H.266/VVC on MS-SSIM metric. Moreover, visual analysis tasks (i.e., object detection and semantic segmentation) are conducted to verify the proposed compression method that could preserve semantic fidelity besides signal-level precision.</p>\",\"PeriodicalId\":51116,\"journal\":{\"name\":\"Machine Vision and Applications\",\"volume\":\"33 1\",\"pages\":\"\"},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2024-02-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Machine Vision and Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s00138-023-01507-x\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine Vision and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s00138-023-01507-x","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
End-to-end optimized image compression with the frequency-oriented transform
Image compression constitutes a significant challenge amid the era of information explosion. Recent studies employing deep learning methods have demonstrated the superior performance of learning-based image compression methods over traditional codecs. However, an inherent challenge associated with these methods lies in their lack of interpretability. Following an analysis of the varying degrees of compression degradation across different frequency bands, we propose the end-to-end optimized image compression model facilitated by the frequency-oriented transform. The proposed end-to-end image compression model consists of four components: spatial sampling, frequency-oriented transform, entropy estimation, and frequency-aware fusion. The frequency-oriented transform separates the original image signal into distinct frequency bands, aligning with the human-interpretable concept. Leveraging the non-overlapping hypothesis, the model enables scalable coding through the selective transmission of arbitrary frequency components. Extensive experiments are conducted to demonstrate that our model outperforms all traditional codecs including next-generation standard H.266/VVC on MS-SSIM metric. Moreover, visual analysis tasks (i.e., object detection and semantic segmentation) are conducted to verify the proposed compression method that could preserve semantic fidelity besides signal-level precision.
期刊介绍:
Machine Vision and Applications publishes high-quality technical contributions in machine vision research and development. Specifically, the editors encourage submittals in all applications and engineering aspects of image-related computing. In particular, original contributions dealing with scientific, commercial, industrial, military, and biomedical applications of machine vision, are all within the scope of the journal.
Particular emphasis is placed on engineering and technology aspects of image processing and computer vision.
The following aspects of machine vision applications are of interest: algorithms, architectures, VLSI implementations, AI techniques and expert systems for machine vision, front-end sensing, multidimensional and multisensor machine vision, real-time techniques, image databases, virtual reality and visualization. Papers must include a significant experimental validation component.