Li Wang;Xin Wu;Yi Zhang;Xinyun Zhang;Lianming Xu;Zhihua Wu;Aiguo Fei
{"title":"DeepAdaIn-Net: Deep Adaptive Device-Edge Collaborative Inference for Augmented Reality","authors":"Li Wang;Xin Wu;Yi Zhang;Xinyun Zhang;Lianming Xu;Zhihua Wu;Aiguo Fei","doi":"10.1109/JSTSP.2023.3312914","DOIUrl":null,"url":null,"abstract":"The object inference for augmented reality (AR) requires a precise object localization within user's physical environment and the adaptability to dynamic communication conditions. Deep learning (DL) is advantageous in capturing highly-nonlinear features of diverse data sources drawn from complex objects. However, the existing DL techniques may have disfluency or instability issues when deployed on resource-constrained devices with poor communication conditions, resulting in bad user experiences. This article addresses these issues by proposing a deep adaptive inference network called DeepAdaIn-Net for the real-time device-edge collaborative object inference, aiming at reducing feature transmission volume while ensuring high feature-fitting accuracy during inference. Specifically, DeepAdaIn-Net encompasses a partition point selection (PPS) module, a high feature compression learning (HFCL) module, a bandwidth-aware feature configuration (BaFC) module, and a feature consistency compensation (FCC) module. The PPS module minimizes the total execution latency, including inference and transmission latency. The HFCL and BaFC modules can decouple the training and inference process by integrating a high-compression ratio feature encoder with the bandwidth-aware feature configuration, which ensures that the compressed data can adapt to the varying communication bandwidths. The FCC module fills the information gaps among the compressed features, guaranteeing high feature expression ability. We conduct extensive experiments to validate DeepAdaIn-Net using two object inference datasets: COCO2017 and emergency fire datasets, and the results demonstrate that our approach outperforms several conventional methods by deriving an optimal 123x feature compression for \n<inline-formula><tex-math>$640\\times 640$</tex-math></inline-formula>\n images, which results in a mere 63.3 ms total latency and an accuracy loss of less than 3% when operating at a bandwidth of 16 Mbps.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":null,"pages":null},"PeriodicalIF":8.7000,"publicationDate":"2023-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal of Selected Topics in Signal Processing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10261454/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
The object inference for augmented reality (AR) requires a precise object localization within user's physical environment and the adaptability to dynamic communication conditions. Deep learning (DL) is advantageous in capturing highly-nonlinear features of diverse data sources drawn from complex objects. However, the existing DL techniques may have disfluency or instability issues when deployed on resource-constrained devices with poor communication conditions, resulting in bad user experiences. This article addresses these issues by proposing a deep adaptive inference network called DeepAdaIn-Net for the real-time device-edge collaborative object inference, aiming at reducing feature transmission volume while ensuring high feature-fitting accuracy during inference. Specifically, DeepAdaIn-Net encompasses a partition point selection (PPS) module, a high feature compression learning (HFCL) module, a bandwidth-aware feature configuration (BaFC) module, and a feature consistency compensation (FCC) module. The PPS module minimizes the total execution latency, including inference and transmission latency. The HFCL and BaFC modules can decouple the training and inference process by integrating a high-compression ratio feature encoder with the bandwidth-aware feature configuration, which ensures that the compressed data can adapt to the varying communication bandwidths. The FCC module fills the information gaps among the compressed features, guaranteeing high feature expression ability. We conduct extensive experiments to validate DeepAdaIn-Net using two object inference datasets: COCO2017 and emergency fire datasets, and the results demonstrate that our approach outperforms several conventional methods by deriving an optimal 123x feature compression for
$640\times 640$
images, which results in a mere 63.3 ms total latency and an accuracy loss of less than 3% when operating at a bandwidth of 16 Mbps.
期刊介绍:
The IEEE Journal of Selected Topics in Signal Processing (JSTSP) focuses on the Field of Interest of the IEEE Signal Processing Society, which encompasses the theory and application of various signal processing techniques. These techniques include filtering, coding, transmitting, estimating, detecting, analyzing, recognizing, synthesizing, recording, and reproducing signals using digital or analog devices. The term "signal" covers a wide range of data types, including audio, video, speech, image, communication, geophysical, sonar, radar, medical, musical, and others.
The journal format allows for in-depth exploration of signal processing topics, enabling the Society to cover both established and emerging areas. This includes interdisciplinary fields such as biomedical engineering and language processing, as well as areas not traditionally associated with engineering.