Recent Advances in End-to-End Learned Image and Video Compression

Wen-Hsiao Peng, H. Hang
{"title":"Recent Advances in End-to-End Learned Image and Video Compression","authors":"Wen-Hsiao Peng, H. Hang","doi":"10.1109/VCIP49819.2020.9301753","DOIUrl":null,"url":null,"abstract":"The DCT-based transform coding technique was adopted by the international standards (ISO JPEG, ITU H.261/264/265, ISO MPEG-2/4/H, and many others) for nearly 30 years. Although researchers are still trying to improve its efficiency by fine-tuning its components and parameters, the basic structure has not changed in the past two decades.The deep learning technology recently developed may provide a new direction for constructing a high-compression image/video coding system. Recent results, particularly from the Challenge on Learned Image Compression (CLIC) at CVPR, indicate that this new type of schemes (often trained end-to-end) may have good potential for further improving compression efficiency.In the first part of this tutorial, we shall (1) summarize briefly the progress of this topic in the past 3 or so years, including an overview of CLIC results and JPEG AI Call-for-Evidence Challenge on Learning-based Image Coding (issued in early 2020). Because Deep Neural Network (DNN)-based image compression is a new area, several techniques and structures have been tested. The recently published autoencoder-based schemes can achieve similar PSNR to BPG (Better Portable Graphics, H.265 still image standard) and has superior subject quality (e.g., MSSSIM), especially at the very low bit rates. In the second part, we shall (2) address the detailed design concepts of image compression algorithms using the autoencoder structure. In the third part, we shall switch gears to (3) explore the emerging area of DNN-based video compression. Recent publications in this area have indicated that end-to-end trained video compression can achieve comparable or superior rate-distortion performance to HEVC/H.265. The CLIC at CVPR 2020 also created for the first time a new track dedicated to P-frame coding.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"205 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/VCIP49819.2020.9301753","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

The DCT-based transform coding technique was adopted by the international standards (ISO JPEG, ITU H.261/264/265, ISO MPEG-2/4/H, and many others) for nearly 30 years. Although researchers are still trying to improve its efficiency by fine-tuning its components and parameters, the basic structure has not changed in the past two decades.The deep learning technology recently developed may provide a new direction for constructing a high-compression image/video coding system. Recent results, particularly from the Challenge on Learned Image Compression (CLIC) at CVPR, indicate that this new type of schemes (often trained end-to-end) may have good potential for further improving compression efficiency.In the first part of this tutorial, we shall (1) summarize briefly the progress of this topic in the past 3 or so years, including an overview of CLIC results and JPEG AI Call-for-Evidence Challenge on Learning-based Image Coding (issued in early 2020). Because Deep Neural Network (DNN)-based image compression is a new area, several techniques and structures have been tested. The recently published autoencoder-based schemes can achieve similar PSNR to BPG (Better Portable Graphics, H.265 still image standard) and has superior subject quality (e.g., MSSSIM), especially at the very low bit rates. In the second part, we shall (2) address the detailed design concepts of image compression algorithms using the autoencoder structure. In the third part, we shall switch gears to (3) explore the emerging area of DNN-based video compression. Recent publications in this area have indicated that end-to-end trained video compression can achieve comparable or superior rate-distortion performance to HEVC/H.265. The CLIC at CVPR 2020 also created for the first time a new track dedicated to P-frame coding.
端到端学习图像和视频压缩的最新进展
基于dct的变换编码技术被国际标准(ISO JPEG、ITU H.261/264/265、ISO MPEG-2/4/H等)采用了近30年。尽管研究人员仍在试图通过微调其组件和参数来提高其效率,但在过去的20年里,其基本结构没有改变。近年来发展起来的深度学习技术为构建高压缩图像/视频编码系统提供了新的方向。最近的结果,特别是来自CVPR的学习图像压缩挑战(CLIC),表明这种新型方案(通常是端到端训练)可能具有进一步提高压缩效率的良好潜力。在本教程的第一部分中,我们将(1)简要总结过去3年左右该主题的进展,包括概述CLIC结果和JPEG AI基于学习的图像编码证据征集挑战(2020年初发布)。由于基于深度神经网络(Deep Neural Network, DNN)的图像压缩是一个新领域,因此已有几种技术和结构进行了测试。最近发布的基于自动编码器的方案可以实现与BPG(更好的便携式图形,H.265静止图像标准)相似的PSNR,并且具有优越的主体质量(例如,MSSSIM),特别是在非常低的比特率下。在第二部分中,我们将(2)讨论使用自编码器结构的图像压缩算法的详细设计概念。在第三部分中,我们将切换到(3)探索基于dnn的视频压缩的新兴领域。该领域的最新出版物表明,端到端训练视频压缩可以实现与HEVC/H.265相当或更高的率失真性能。CVPR 2020上的CLIC还首次创建了一个专门用于p帧编码的新轨道。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信