Interpretable Image/Video Compression by Extracting the Least Context Map

Proceedings of the 2023 4th International Conference on Computing, Networks and Internet of Things Pub Date : 2023-05-26 DOI:10.1145/3603781.3603841

Huan Huang, Wei Yang

{"title":"Interpretable Image/Video Compression by Extracting the Least Context Map","authors":"Huan Huang, Wei Yang","doi":"10.1145/3603781.3603841","DOIUrl":null,"url":null,"abstract":"Current deep neural networks based image compression methods lack interpretability. Most of them follow a standard encoder-decoder framework and cannot be directly applied to video compression. We present a novel and interpretable finder-generator framework for image/video compression. The finder analyses the input image and selects important points on a one-channel binary map of the original width and height rather than compresses images into a multi-channel bitstream in a downsampled bottleneck layer. The binary one-channel map output by the finder retains the original width and height to keep the spatial information. We name it the least context map (LCM). The generator analyses the LCM to restore the original image based on its trained parameters. We put forward two different selection strategies for guiding the finder to extract the LCM. By extracting LCMs from images, our framework can reduce the size of real-world traffic surveillance videos by 96% compared to most common video codecs and by 85% compared to the next generation video compression codec VP9. This size reduction results from that adjacent frames always share similar LCMs and thus LCMs can be significantly compressed along the time axis. In addition, extensive experiments on Kodak dataset demonstrate our model surpasses the state-of-the-art image compression methods at low bit-rates. We only require an average compressed size of 2.01 kilobytes to achieve a high average MS-SSIM score of 0.9. This size is 50% smaller than JPEG, 43% smaller than FRRNN, and 11% smaller than WebP. Further comparative experiments on image generation demonstrate the LCM is superior to the semantic map and the edge map in higher information capacity and less required storage.","PeriodicalId":391180,"journal":{"name":"Proceedings of the 2023 4th International Conference on Computing, Networks and Internet of Things","volume":"50 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2023 4th International Conference on Computing, Networks and Internet of Things","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3603781.3603841","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Current deep neural networks based image compression methods lack interpretability. Most of them follow a standard encoder-decoder framework and cannot be directly applied to video compression. We present a novel and interpretable finder-generator framework for image/video compression. The finder analyses the input image and selects important points on a one-channel binary map of the original width and height rather than compresses images into a multi-channel bitstream in a downsampled bottleneck layer. The binary one-channel map output by the finder retains the original width and height to keep the spatial information. We name it the least context map (LCM). The generator analyses the LCM to restore the original image based on its trained parameters. We put forward two different selection strategies for guiding the finder to extract the LCM. By extracting LCMs from images, our framework can reduce the size of real-world traffic surveillance videos by 96% compared to most common video codecs and by 85% compared to the next generation video compression codec VP9. This size reduction results from that adjacent frames always share similar LCMs and thus LCMs can be significantly compressed along the time axis. In addition, extensive experiments on Kodak dataset demonstrate our model surpasses the state-of-the-art image compression methods at low bit-rates. We only require an average compressed size of 2.01 kilobytes to achieve a high average MS-SSIM score of 0.9. This size is 50% smaller than JPEG, 43% smaller than FRRNN, and 11% smaller than WebP. Further comparative experiments on image generation demonstrate the LCM is superior to the semantic map and the edge map in higher information capacity and less required storage.

查看原文本刊更多论文

通过提取最小上下文映射的可解释图像/视频压缩

目前基于深度神经网络的图像压缩方法缺乏可解释性。它们大多遵循标准的编码器-解码器框架，不能直接应用于视频压缩。我们提出了一种新的、可解释的图像/视频压缩查找器生成器框架。finder对输入图像进行分析，在原始宽度和高度的单通道二进制映射中选择重要的点，而不是在下采样瓶颈层将图像压缩成多通道比特流。查找器输出的二进制单通道地图保留了原始的宽度和高度，以保持空间信息。我们将其命名为最小上下文映射(LCM)。生成器对LCM进行分析，根据训练好的参数恢复原始图像。我们提出了两种不同的选择策略来引导寻星器提取LCM。通过从图像中提取lcm，我们的框架可以将真实世界的交通监控视频的大小与最常见的视频编解码器相比减少96%，与下一代视频压缩编解码器VP9相比减少85%。这是因为相邻帧总是共享相似的lcm，因此lcm可以沿着时间轴被显著压缩。此外，在柯达数据集上进行的大量实验表明，我们的模型在低比特率下优于最先进的图像压缩方法。我们只需要2.01 kb的平均压缩大小来实现0.9的高平均MS-SSIM分数。这个尺寸比JPEG小50%，比FRRNN小43%，比WebP小11%。进一步的图像生成对比实验表明，LCM比语义图和边缘图具有更高的信息量和更少的存储空间。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2023 4th International Conference on Computing, Networks and Internet of Things

自引率

0.00%

发文量