DeepAdaIn-Net: Deep Adaptive Device-Edge Collaborative Inference for Augmented Reality

IF 8.7 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC
Li Wang;Xin Wu;Yi Zhang;Xinyun Zhang;Lianming Xu;Zhihua Wu;Aiguo Fei
{"title":"DeepAdaIn-Net: Deep Adaptive Device-Edge Collaborative Inference for Augmented Reality","authors":"Li Wang;Xin Wu;Yi Zhang;Xinyun Zhang;Lianming Xu;Zhihua Wu;Aiguo Fei","doi":"10.1109/JSTSP.2023.3312914","DOIUrl":null,"url":null,"abstract":"The object inference for augmented reality (AR) requires a precise object localization within user's physical environment and the adaptability to dynamic communication conditions. Deep learning (DL) is advantageous in capturing highly-nonlinear features of diverse data sources drawn from complex objects. However, the existing DL techniques may have disfluency or instability issues when deployed on resource-constrained devices with poor communication conditions, resulting in bad user experiences. This article addresses these issues by proposing a deep adaptive inference network called DeepAdaIn-Net for the real-time device-edge collaborative object inference, aiming at reducing feature transmission volume while ensuring high feature-fitting accuracy during inference. Specifically, DeepAdaIn-Net encompasses a partition point selection (PPS) module, a high feature compression learning (HFCL) module, a bandwidth-aware feature configuration (BaFC) module, and a feature consistency compensation (FCC) module. The PPS module minimizes the total execution latency, including inference and transmission latency. The HFCL and BaFC modules can decouple the training and inference process by integrating a high-compression ratio feature encoder with the bandwidth-aware feature configuration, which ensures that the compressed data can adapt to the varying communication bandwidths. The FCC module fills the information gaps among the compressed features, guaranteeing high feature expression ability. We conduct extensive experiments to validate DeepAdaIn-Net using two object inference datasets: COCO2017 and emergency fire datasets, and the results demonstrate that our approach outperforms several conventional methods by deriving an optimal 123x feature compression for \n<inline-formula><tex-math>$640\\times 640$</tex-math></inline-formula>\n images, which results in a mere 63.3 ms total latency and an accuracy loss of less than 3% when operating at a bandwidth of 16 Mbps.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":null,"pages":null},"PeriodicalIF":8.7000,"publicationDate":"2023-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal of Selected Topics in Signal Processing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10261454/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

The object inference for augmented reality (AR) requires a precise object localization within user's physical environment and the adaptability to dynamic communication conditions. Deep learning (DL) is advantageous in capturing highly-nonlinear features of diverse data sources drawn from complex objects. However, the existing DL techniques may have disfluency or instability issues when deployed on resource-constrained devices with poor communication conditions, resulting in bad user experiences. This article addresses these issues by proposing a deep adaptive inference network called DeepAdaIn-Net for the real-time device-edge collaborative object inference, aiming at reducing feature transmission volume while ensuring high feature-fitting accuracy during inference. Specifically, DeepAdaIn-Net encompasses a partition point selection (PPS) module, a high feature compression learning (HFCL) module, a bandwidth-aware feature configuration (BaFC) module, and a feature consistency compensation (FCC) module. The PPS module minimizes the total execution latency, including inference and transmission latency. The HFCL and BaFC modules can decouple the training and inference process by integrating a high-compression ratio feature encoder with the bandwidth-aware feature configuration, which ensures that the compressed data can adapt to the varying communication bandwidths. The FCC module fills the information gaps among the compressed features, guaranteeing high feature expression ability. We conduct extensive experiments to validate DeepAdaIn-Net using two object inference datasets: COCO2017 and emergency fire datasets, and the results demonstrate that our approach outperforms several conventional methods by deriving an optimal 123x feature compression for $640\times 640$ images, which results in a mere 63.3 ms total latency and an accuracy loss of less than 3% when operating at a bandwidth of 16 Mbps.
DeepAdaIn-Net:用于增强现实的深度自适应设备边缘协同推理
增强现实(AR)的对象推理需要在用户的物理环境中精确定位对象,并适应动态通信条件。深度学习(DL)在捕获从复杂对象中提取的各种数据源的高度非线性特征方面具有优势。然而,现有的深度学习技术在部署在通信条件差、资源受限的设备上时,可能存在不流畅或不稳定的问题,从而导致糟糕的用户体验。为了解决这些问题,本文提出了一种深度自适应推理网络,称为DeepAdaIn-Net,用于实时设备边缘协同对象推理,旨在减少特征传输量,同时确保推理过程中的高特征拟合精度。具体来说,DeepAdaIn-Net包括一个分区点选择(PPS)模块、一个高特征压缩学习(HFCL)模块、一个带宽感知特征配置(BaFC)模块和一个特征一致性补偿(FCC)模块。PPS模块最大限度地减少了总执行延迟,包括推理和传输延迟。HFCL和BaFC模块通过集成高压缩比特征编码器和带宽感知特征配置来解耦训练和推理过程,确保压缩数据能够适应不同的通信带宽。FCC模块填补了压缩特征之间的信息空白,保证了较高的特征表达能力。我们使用两个对象推理数据集(COCO2017和紧急火灾数据集)进行了广泛的实验来验证DeepAdaIn-Net,结果表明,我们的方法优于几种传统方法,对640美元× 640美元的图像进行了最佳的123x特征压缩,在16 Mbps的带宽下运行时,总延迟仅为63.3 ms,精度损失小于3%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Journal of Selected Topics in Signal Processing
IEEE Journal of Selected Topics in Signal Processing 工程技术-工程:电子与电气
CiteScore
19.00
自引率
1.30%
发文量
135
审稿时长
3 months
期刊介绍: The IEEE Journal of Selected Topics in Signal Processing (JSTSP) focuses on the Field of Interest of the IEEE Signal Processing Society, which encompasses the theory and application of various signal processing techniques. These techniques include filtering, coding, transmitting, estimating, detecting, analyzing, recognizing, synthesizing, recording, and reproducing signals using digital or analog devices. The term "signal" covers a wide range of data types, including audio, video, speech, image, communication, geophysical, sonar, radar, medical, musical, and others. The journal format allows for in-depth exploration of signal processing topics, enabling the Society to cover both established and emerging areas. This includes interdisciplinary fields such as biomedical engineering and language processing, as well as areas not traditionally associated with engineering.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信