目标跟踪任务的高效特征压缩

2022 IEEE International Conference on Image Processing (ICIP) Pub Date : 2022-10-16 DOI:10.1109/ICIP46576.2022.9897802

R. Henzel, K. Misra, Tianying Ji

{"title":"目标跟踪任务的高效特征压缩","authors":"R. Henzel, K. Misra, Tianying Ji","doi":"10.1109/ICIP46576.2022.9897802","DOIUrl":null,"url":null,"abstract":"In object tracking systems, often clients capture video, encode it and transmit it to a server that performs the actual machine task. In this paper we propose an alternative architecture, where we instead transmit features to the server. Specifically, we partition the Joint Detection and Embedding (JDE) person tracking network into client and server side sub-networks and code the intermediate tensors i.e. features. The features are compressed for transmission using a Deep Neural Network (DNN) we design and train specifically for carrying out the tracking task. The DNN uses trainable non-uniform quantizers, conditional probability estimators, hierarchical coding; concepts that have been used in the past for neural networks based image and video compression. Additionally, the DNN includes a novel parameterized dual-path layer that comprises of an autoencoder in one path and a convolution layer in the other. The tensor output by each path is added before being consumed by subsequent layers. The parameter value for this dual-path layer controls the output channel count and correspondingly the bitrate of transmitted bitstream. We demonstrate that our model improves coding efficiency by 43.67% over state-of-the-art Versatile Video Coding standard that codes the source video in pixel domain.","PeriodicalId":387035,"journal":{"name":"2022 IEEE International Conference on Image Processing (ICIP)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Efficient Feature Compression for the Object Tracking Task\",\"authors\":\"R. Henzel, K. Misra, Tianying Ji\",\"doi\":\"10.1109/ICIP46576.2022.9897802\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In object tracking systems, often clients capture video, encode it and transmit it to a server that performs the actual machine task. In this paper we propose an alternative architecture, where we instead transmit features to the server. Specifically, we partition the Joint Detection and Embedding (JDE) person tracking network into client and server side sub-networks and code the intermediate tensors i.e. features. The features are compressed for transmission using a Deep Neural Network (DNN) we design and train specifically for carrying out the tracking task. The DNN uses trainable non-uniform quantizers, conditional probability estimators, hierarchical coding; concepts that have been used in the past for neural networks based image and video compression. Additionally, the DNN includes a novel parameterized dual-path layer that comprises of an autoencoder in one path and a convolution layer in the other. The tensor output by each path is added before being consumed by subsequent layers. The parameter value for this dual-path layer controls the output channel count and correspondingly the bitrate of transmitted bitstream. We demonstrate that our model improves coding efficiency by 43.67% over state-of-the-art Versatile Video Coding standard that codes the source video in pixel domain.\",\"PeriodicalId\":387035,\"journal\":{\"name\":\"2022 IEEE International Conference on Image Processing (ICIP)\",\"volume\":\"31 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Conference on Image Processing (ICIP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICIP46576.2022.9897802\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Image Processing (ICIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIP46576.2022.9897802","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在目标跟踪系统中，通常客户端捕获视频，对其进行编码并将其传输到执行实际机器任务的服务器。在本文中，我们提出了一种替代架构，我们将功能传输到服务器。具体而言，我们将JDE人员跟踪网络划分为客户端和服务器端子网络，并对中间张量(即特征)进行编码。这些特征被压缩以使用深度神经网络(DNN)进行传输，我们专门为执行跟踪任务而设计和训练。DNN采用可训练的非均匀量化器、条件概率估计器、分层编码;过去用于基于图像和视频压缩的神经网络的概念。此外，深度神经网络包括一个新的参数化双路径层，其中一个路径由自编码器组成，另一个路径由卷积层组成。每条路径的张量输出在被后续层消耗之前被添加。该双径层的参数值控制输出通道数，并相应地控制传输的比特流的比特率。我们证明，我们的模型比最先进的通用视频编码标准(在像素域编码源视频)提高了43.67%的编码效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Efficient Feature Compression for the Object Tracking Task

In object tracking systems, often clients capture video, encode it and transmit it to a server that performs the actual machine task. In this paper we propose an alternative architecture, where we instead transmit features to the server. Specifically, we partition the Joint Detection and Embedding (JDE) person tracking network into client and server side sub-networks and code the intermediate tensors i.e. features. The features are compressed for transmission using a Deep Neural Network (DNN) we design and train specifically for carrying out the tracking task. The DNN uses trainable non-uniform quantizers, conditional probability estimators, hierarchical coding; concepts that have been used in the past for neural networks based image and video compression. Additionally, the DNN includes a novel parameterized dual-path layer that comprises of an autoencoder in one path and a convolution layer in the other. The tensor output by each path is added before being consumed by subsequent layers. The parameter value for this dual-path layer controls the output channel count and correspondingly the bitrate of transmitted bitstream. We demonstrate that our model improves coding efficiency by 43.67% over state-of-the-art Versatile Video Coding standard that codes the source video in pixel domain.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 IEEE International Conference on Image Processing (ICIP)

自引率

0.00%

发文量