Learning-based compression of visual objects for smart surveillance

Ruben Antonio, S. Faria, Luis M. N. Tavora, A. Navarro, P. Assunção
{"title":"Learning-based compression of visual objects for smart surveillance","authors":"Ruben Antonio, S. Faria, Luis M. N. Tavora, A. Navarro, P. Assunção","doi":"10.1109/IPTA54936.2022.9784147","DOIUrl":null,"url":null,"abstract":"Advanced video applications in smart environments (e.g., smart cities) bring different challenges associated with increasingly intelligent systems and demanding requirements in emerging fields such as urban surveillance, computer vision in industry, medicine and others. As a consequence, a huge amount of visual data is captured to be analyzed by task-algorithm driven machines. In this context, this paper proposes an efficient learning-based approach to compress relevant visual objects, captured in surveillance contexts and delivered for machine vision processing. An object-based compression scheme is devised, comprising multiple autoencoders, each one optimised to produce an efficient latent representation of a corresponding object class. The performance of the proposed approach is evaluated with two types of visual objects: persons and faces and two task-algorithms: class identification and object recognition, besides traditional image quality metrics like PSNR and VMAF. In comparison with the Versatile Video Coding (VVC) standard, the proposed approach achieves significantly better coding efficiency than the VVC, e.g., up to 46.7% BD-rate reduction. The accuracy of the machine vision tasks is also significantly higher when performed over visual objects compressed with the proposed scheme in comparison with the same tasks performed over the same visual objects compressed with the VVC. These results demonstrate that the learning-based approach proposed in this paper is a more efficient solution for compression of visual objects than standard encoding.","PeriodicalId":381729,"journal":{"name":"2022 Eleventh International Conference on Image Processing Theory, Tools and Applications (IPTA)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 Eleventh International Conference on Image Processing Theory, Tools and Applications (IPTA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPTA54936.2022.9784147","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Advanced video applications in smart environments (e.g., smart cities) bring different challenges associated with increasingly intelligent systems and demanding requirements in emerging fields such as urban surveillance, computer vision in industry, medicine and others. As a consequence, a huge amount of visual data is captured to be analyzed by task-algorithm driven machines. In this context, this paper proposes an efficient learning-based approach to compress relevant visual objects, captured in surveillance contexts and delivered for machine vision processing. An object-based compression scheme is devised, comprising multiple autoencoders, each one optimised to produce an efficient latent representation of a corresponding object class. The performance of the proposed approach is evaluated with two types of visual objects: persons and faces and two task-algorithms: class identification and object recognition, besides traditional image quality metrics like PSNR and VMAF. In comparison with the Versatile Video Coding (VVC) standard, the proposed approach achieves significantly better coding efficiency than the VVC, e.g., up to 46.7% BD-rate reduction. The accuracy of the machine vision tasks is also significantly higher when performed over visual objects compressed with the proposed scheme in comparison with the same tasks performed over the same visual objects compressed with the VVC. These results demonstrate that the learning-based approach proposed in this paper is a more efficient solution for compression of visual objects than standard encoding.
基于学习的智能监控视觉对象压缩
智能环境(例如,智能城市)中的高级视频应用带来了与日益智能的系统和新兴领域(如城市监控,工业计算机视觉,医学等)的苛刻要求相关的不同挑战。因此,大量的视觉数据被捕获,并由任务算法驱动的机器进行分析。在此背景下,本文提出了一种高效的基于学习的方法来压缩相关的视觉对象,这些对象在监视环境中捕获并交付给机器视觉处理。设计了一种基于对象的压缩方案,包括多个自动编码器,每个编码器都经过优化以产生相应对象类的有效潜在表示。除了传统的图像质量指标(如PSNR和VMAF)外,还使用两种类型的视觉对象(人和面孔)以及两种任务算法(类识别和对象识别)来评估该方法的性能。与通用视频编码(VVC)标准相比,该方法的编码效率明显高于VVC标准,可将bd率降低46.7%。与使用VVC压缩的相同视觉对象执行相同的任务相比,使用该方案压缩的视觉对象执行相同的任务时,机器视觉任务的准确性也显着更高。这些结果表明,本文提出的基于学习的方法是一种比标准编码更有效的视觉对象压缩解决方案。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信