DeepAdaIn-Net: Deep Adaptive Device-Edge Collaborative Inference for Augmented Reality

IF 8.7 1区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Journal of Selected Topics in Signal Processing Pub Date : 2023-09-22 DOI:10.1109/JSTSP.2023.3312914

Li Wang;Xin Wu;Yi Zhang;Xinyun Zhang;Lianming Xu;Zhihua Wu;Aiguo Fei

{"title":"DeepAdaIn-Net: Deep Adaptive Device-Edge Collaborative Inference for Augmented Reality","authors":"Li Wang;Xin Wu;Yi Zhang;Xinyun Zhang;Lianming Xu;Zhihua Wu;Aiguo Fei","doi":"10.1109/JSTSP.2023.3312914","DOIUrl":null,"url":null,"abstract":"The object inference for augmented reality (AR) requires a precise object localization within user's physical environment and the adaptability to dynamic communication conditions. Deep learning (DL) is advantageous in capturing highly-nonlinear features of diverse data sources drawn from complex objects. However, the existing DL techniques may have disfluency or instability issues when deployed on resource-constrained devices with poor communication conditions, resulting in bad user experiences. This article addresses these issues by proposing a deep adaptive inference network called DeepAdaIn-Net for the real-time device-edge collaborative object inference, aiming at reducing feature transmission volume while ensuring high feature-fitting accuracy during inference. Specifically, DeepAdaIn-Net encompasses a partition point selection (PPS) module, a high feature compression learning (HFCL) module, a bandwidth-aware feature configuration (BaFC) module, and a feature consistency compensation (FCC) module. The PPS module minimizes the total execution latency, including inference and transmission latency. The HFCL and BaFC modules can decouple the training and inference process by integrating a high-compression ratio feature encoder with the bandwidth-aware feature configuration, which ensures that the compressed data can adapt to the varying communication bandwidths. The FCC module fills the information gaps among the compressed features, guaranteeing high feature expression ability. We conduct extensive experiments to validate DeepAdaIn-Net using two object inference datasets: COCO2017 and emergency fire datasets, and the results demonstrate that our approach outperforms several conventional methods by deriving an optimal 123x feature compression for \n<inline-formula><tex-math>$640\\times 640$</tex-math></inline-formula>\n images, which results in a mere 63.3 ms total latency and an accuracy loss of less than 3% when operating at a bandwidth of 16 Mbps.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"17 5","pages":"1052-1063"},"PeriodicalIF":8.7000,"publicationDate":"2023-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal of Selected Topics in Signal Processing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10261454/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

The object inference for augmented reality (AR) requires a precise object localization within user's physical environment and the adaptability to dynamic communication conditions. Deep learning (DL) is advantageous in capturing highly-nonlinear features of diverse data sources drawn from complex objects. However, the existing DL techniques may have disfluency or instability issues when deployed on resource-constrained devices with poor communication conditions, resulting in bad user experiences. This article addresses these issues by proposing a deep adaptive inference network called DeepAdaIn-Net for the real-time device-edge collaborative object inference, aiming at reducing feature transmission volume while ensuring high feature-fitting accuracy during inference. Specifically, DeepAdaIn-Net encompasses a partition point selection (PPS) module, a high feature compression learning (HFCL) module, a bandwidth-aware feature configuration (BaFC) module, and a feature consistency compensation (FCC) module. The PPS module minimizes the total execution latency, including inference and transmission latency. The HFCL and BaFC modules can decouple the training and inference process by integrating a high-compression ratio feature encoder with the bandwidth-aware feature configuration, which ensures that the compressed data can adapt to the varying communication bandwidths. The FCC module fills the information gaps among the compressed features, guaranteeing high feature expression ability. We conduct extensive experiments to validate DeepAdaIn-Net using two object inference datasets: COCO2017 and emergency fire datasets, and the results demonstrate that our approach outperforms several conventional methods by deriving an optimal 123x feature compression for

$640\times 640$

images, which results in a mere 63.3 ms total latency and an accuracy loss of less than 3% when operating at a bandwidth of 16 Mbps.

查看原文本刊更多论文

DeepAdaIn-Net:用于增强现实的深度自适应设备边缘协同推理

增强现实(AR)的对象推理需要在用户的物理环境中精确定位对象，并适应动态通信条件。深度学习(DL)在捕获从复杂对象中提取的各种数据源的高度非线性特征方面具有优势。然而，现有的深度学习技术在部署在通信条件差、资源受限的设备上时，可能存在不流畅或不稳定的问题，从而导致糟糕的用户体验。为了解决这些问题，本文提出了一种深度自适应推理网络，称为DeepAdaIn-Net，用于实时设备边缘协同对象推理，旨在减少特征传输量，同时确保推理过程中的高特征拟合精度。具体来说，DeepAdaIn-Net包括一个分区点选择(PPS)模块、一个高特征压缩学习(HFCL)模块、一个带宽感知特征配置(BaFC)模块和一个特征一致性补偿(FCC)模块。PPS模块最大限度地减少了总执行延迟，包括推理和传输延迟。HFCL和BaFC模块通过集成高压缩比特征编码器和带宽感知特征配置来解耦训练和推理过程，确保压缩数据能够适应不同的通信带宽。FCC模块填补了压缩特征之间的信息空白，保证了较高的特征表达能力。我们使用两个对象推理数据集(COCO2017和紧急火灾数据集)进行了广泛的实验来验证DeepAdaIn-Net，结果表明，我们的方法优于几种传统方法，对640美元× 640美元的图像进行了最佳的123x特征压缩，在16 Mbps的带宽下运行时，总延迟仅为63.3 ms，精度损失小于3%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Journal of Selected Topics in Signal Processing 工程技术-工程：电子与电气

CiteScore

19.00

自引率

1.30%

发文量

135

审稿时长

3 months

期刊介绍： The IEEE Journal of Selected Topics in Signal Processing (JSTSP) focuses on the Field of Interest of the IEEE Signal Processing Society, which encompasses the theory and application of various signal processing techniques. These techniques include filtering, coding, transmitting, estimating, detecting, analyzing, recognizing, synthesizing, recording, and reproducing signals using digital or analog devices. The term "signal" covers a wide range of data types, including audio, video, speech, image, communication, geophysical, sonar, radar, medical, musical, and others. The journal format allows for in-depth exploration of signal processing topics, enabling the Society to cover both established and emerging areas. This includes interdisciplinary fields such as biomedical engineering and language processing, as well as areas not traditionally associated with engineering.