为人类和机器学习可扩展的视频编码。

IF 2.4 4区 计算机科学
Hadi Hadizadeh, Ivan V Bajić
{"title":"为人类和机器学习可扩展的视频编码。","authors":"Hadi Hadizadeh, Ivan V Bajić","doi":"10.1186/s13640-024-00657-w","DOIUrl":null,"url":null,"abstract":"<p><p>Video coding has traditionally been developed to support services such as video streaming, videoconferencing, digital TV, and so on. The main intent was to enable human viewing of the encoded content. However, with the advances in deep neural networks (DNNs), encoded video is increasingly being used for automatic video analytics performed by machines. In applications such as automatic traffic monitoring, analytics such as vehicle detection, tracking and counting, would run continuously, while human viewing could be required occasionally to review potential incidents. To support such applications, a new paradigm for video coding is needed that will facilitate efficient representation and compression of video for both machine and human use in a scalable manner. In this manuscript, we introduce an end-to-end learnable video codec that supports a machine vision task in its base layer, while its enhancement layer, together with the base layer, supports input reconstruction for human viewing. The proposed system is constructed based on the concept of conditional coding to achieve better compression gains. Comprehensive experimental evaluations conducted on four standard video datasets demonstrate that our framework outperforms both state-of-the-art learned and conventional video codecs in its base layer, while maintaining comparable performance on the human vision task in its enhancement layer.</p>","PeriodicalId":49322,"journal":{"name":"Eurasip Journal on Image and Video Processing","volume":"2024 1","pages":"41"},"PeriodicalIF":2.4000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11564357/pdf/","citationCount":"0","resultStr":"{\"title\":\"Learned scalable video coding for humans and machines.\",\"authors\":\"Hadi Hadizadeh, Ivan V Bajić\",\"doi\":\"10.1186/s13640-024-00657-w\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Video coding has traditionally been developed to support services such as video streaming, videoconferencing, digital TV, and so on. The main intent was to enable human viewing of the encoded content. However, with the advances in deep neural networks (DNNs), encoded video is increasingly being used for automatic video analytics performed by machines. In applications such as automatic traffic monitoring, analytics such as vehicle detection, tracking and counting, would run continuously, while human viewing could be required occasionally to review potential incidents. To support such applications, a new paradigm for video coding is needed that will facilitate efficient representation and compression of video for both machine and human use in a scalable manner. In this manuscript, we introduce an end-to-end learnable video codec that supports a machine vision task in its base layer, while its enhancement layer, together with the base layer, supports input reconstruction for human viewing. The proposed system is constructed based on the concept of conditional coding to achieve better compression gains. Comprehensive experimental evaluations conducted on four standard video datasets demonstrate that our framework outperforms both state-of-the-art learned and conventional video codecs in its base layer, while maintaining comparable performance on the human vision task in its enhancement layer.</p>\",\"PeriodicalId\":49322,\"journal\":{\"name\":\"Eurasip Journal on Image and Video Processing\",\"volume\":\"2024 1\",\"pages\":\"41\"},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2024-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11564357/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Eurasip Journal on Image and Video Processing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1186/s13640-024-00657-w\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/11/14 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Eurasip Journal on Image and Video Processing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1186/s13640-024-00657-w","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/11/14 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

视频编码的开发历来是为了支持视频流、视频会议、数字电视等服务。其主要目的是让人类能够观看编码内容。然而,随着深度神经网络(DNN)的发展,编码视频越来越多地用于机器自动视频分析。在自动交通监控等应用中,车辆检测、跟踪和计数等分析将持续运行,而人类可能需要偶尔查看,以审查潜在的事故。为支持此类应用,需要一种新的视频编码范例,以可扩展的方式促进视频的高效表示和压缩,供机器和人类使用。在本手稿中,我们介绍了一种端到端可学习视频编解码器,它的基础层支持机器视觉任务,而增强层则与基础层一起支持供人类观看的输入重构。所提出的系统是基于条件编码概念构建的,以实现更好的压缩增益。在四个标准视频数据集上进行的综合实验评估表明,我们的框架在基础层上的表现优于最先进的学习型视频编解码器和传统视频编解码器,同时在增强层上的人类视觉任务中保持了相当的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Learned scalable video coding for humans and machines.

Video coding has traditionally been developed to support services such as video streaming, videoconferencing, digital TV, and so on. The main intent was to enable human viewing of the encoded content. However, with the advances in deep neural networks (DNNs), encoded video is increasingly being used for automatic video analytics performed by machines. In applications such as automatic traffic monitoring, analytics such as vehicle detection, tracking and counting, would run continuously, while human viewing could be required occasionally to review potential incidents. To support such applications, a new paradigm for video coding is needed that will facilitate efficient representation and compression of video for both machine and human use in a scalable manner. In this manuscript, we introduce an end-to-end learnable video codec that supports a machine vision task in its base layer, while its enhancement layer, together with the base layer, supports input reconstruction for human viewing. The proposed system is constructed based on the concept of conditional coding to achieve better compression gains. Comprehensive experimental evaluations conducted on four standard video datasets demonstrate that our framework outperforms both state-of-the-art learned and conventional video codecs in its base layer, while maintaining comparable performance on the human vision task in its enhancement layer.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Eurasip Journal on Image and Video Processing
Eurasip Journal on Image and Video Processing Engineering-Electrical and Electronic Engineering
CiteScore
7.10
自引率
0.00%
发文量
23
审稿时长
6.8 months
期刊介绍: EURASIP Journal on Image and Video Processing is intended for researchers from both academia and industry, who are active in the multidisciplinary field of image and video processing. The scope of the journal covers all theoretical and practical aspects of the domain, from basic research to development of application.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信