用于大范围自主探索的879GOPS 243mW 80fps VGA全视觉CNN-SLAM处理器

Ziyun Li, Yu Chen, Luyao Gong, Lu Liu, D. Sylvester, D. Blaauw, Hun-Seok Kim
{"title":"用于大范围自主探索的879GOPS 243mW 80fps VGA全视觉CNN-SLAM处理器","authors":"Ziyun Li, Yu Chen, Luyao Gong, Lu Liu, D. Sylvester, D. Blaauw, Hun-Seok Kim","doi":"10.1109/ISSCC.2019.8662397","DOIUrl":null,"url":null,"abstract":"Simultaneous localization and mapping (SLAM) estimates an agent’s trajectory for all six degrees of freedom (6 DoF) and constructs a 3D map of an unknown surrounding. It is a fundamental kernel that enables head-mounted augmented/virtual reality devices and autonomous navigation of micro aerial vehicles. A noticeable recent trend in visual SLAM is to apply computation- and memory-intensive convolutional neural networks (CNNs) that outperform traditional hand-designed feature-based methods [1]. For each video frame, CNN-extracted features are matched with stored keypoints to estimate the agent’s 6-DoF pose by solving a perspective-n-points (PnP) non-linear optimization problem (Fig. 7.3.1, left). The agent’s long-term trajectory over multiple frames is refined by a bundle adjustment process (BA, Fig. 7.3.1 right), which involves a large-scale ($\\sim$120 variables) non-linear optimization. Visual SLAM requires massive computation ($\\gt250$ GOP/s) in the CNN-based feature extraction and matching, as well as data-dependent dynamic memory access and control flow with high-precision operations, creating significant low-power design challenges. Software implementations are impractical, resulting in 0.2s runtime with a $\\sim$3 GHz CPU + GPU system with $\\gt100$ MB memory footprint and $\\gt100$ W power consumption. Prior ASICs have implemented either an incomplete SLAM system [2, 3] that lacks estimation of ego-motion or employed a simplified (non-CNN) feature extraction and tracking [2, 4, 5] that limits SLAM quality and range. A recent ASIC [5] augments visual SLAM with an off-chip high-precision inertial measurement unit (IMU), simplifying the computational complexity, but incurring additional power and cost overhead.","PeriodicalId":265551,"journal":{"name":"2019 IEEE International Solid- State Circuits Conference - (ISSCC)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"35","resultStr":"{\"title\":\"An 879GOPS 243mW 80fps VGA Fully Visual CNN-SLAM Processor for Wide-Range Autonomous Exploration\",\"authors\":\"Ziyun Li, Yu Chen, Luyao Gong, Lu Liu, D. Sylvester, D. Blaauw, Hun-Seok Kim\",\"doi\":\"10.1109/ISSCC.2019.8662397\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Simultaneous localization and mapping (SLAM) estimates an agent’s trajectory for all six degrees of freedom (6 DoF) and constructs a 3D map of an unknown surrounding. It is a fundamental kernel that enables head-mounted augmented/virtual reality devices and autonomous navigation of micro aerial vehicles. A noticeable recent trend in visual SLAM is to apply computation- and memory-intensive convolutional neural networks (CNNs) that outperform traditional hand-designed feature-based methods [1]. For each video frame, CNN-extracted features are matched with stored keypoints to estimate the agent’s 6-DoF pose by solving a perspective-n-points (PnP) non-linear optimization problem (Fig. 7.3.1, left). The agent’s long-term trajectory over multiple frames is refined by a bundle adjustment process (BA, Fig. 7.3.1 right), which involves a large-scale ($\\\\sim$120 variables) non-linear optimization. Visual SLAM requires massive computation ($\\\\gt250$ GOP/s) in the CNN-based feature extraction and matching, as well as data-dependent dynamic memory access and control flow with high-precision operations, creating significant low-power design challenges. Software implementations are impractical, resulting in 0.2s runtime with a $\\\\sim$3 GHz CPU + GPU system with $\\\\gt100$ MB memory footprint and $\\\\gt100$ W power consumption. Prior ASICs have implemented either an incomplete SLAM system [2, 3] that lacks estimation of ego-motion or employed a simplified (non-CNN) feature extraction and tracking [2, 4, 5] that limits SLAM quality and range. A recent ASIC [5] augments visual SLAM with an off-chip high-precision inertial measurement unit (IMU), simplifying the computational complexity, but incurring additional power and cost overhead.\",\"PeriodicalId\":265551,\"journal\":{\"name\":\"2019 IEEE International Solid- State Circuits Conference - (ISSCC)\",\"volume\":\"42 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"35\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE International Solid- State Circuits Conference - (ISSCC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISSCC.2019.8662397\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Solid- State Circuits Conference - (ISSCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISSCC.2019.8662397","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 35

摘要

同时定位和映射(SLAM)估计代理的所有6个自由度(6 DoF)的轨迹,并构建未知环境的3D地图。它是实现头戴式增强/虚拟现实设备和微型飞行器自主导航的基本内核。视觉SLAM的一个值得注意的最新趋势是应用计算和内存密集型卷积神经网络(cnn),其优于传统的手工设计的基于特征的方法[1]。对于每个视频帧,cnn提取的特征与存储的关键点相匹配,通过求解一个视角-n点(PnP)非线性优化问题来估计智能体的6自由度姿态(图7.3.1,左)。智能体在多个帧上的长期轨迹通过束调整过程(BA,图7.3.1右)进行细化,该过程涉及大规模($\sim$120变量)非线性优化。Visual SLAM在基于cnn的特征提取和匹配中需要大量的计算($\gt250$ GOP/s),以及基于数据的动态内存访问和高精度操作的控制流,这给低功耗设计带来了重大挑战。软件实现不切实际,导致运行时间为0.2s, CPU + GPU系统为$ $ sim$ 3ghz,内存占用$ $ gt100$ MB,功耗$ $ gt100$ W。先前的asic要么实现了不完整的SLAM系统[2,3],缺乏对自我运动的估计,要么采用了简化的(非cnn)特征提取和跟踪[2,4,5],限制了SLAM的质量和范围。最近的ASIC[5]通过片外高精度惯性测量单元(IMU)增强了视觉SLAM,简化了计算复杂度,但产生了额外的功率和成本开销。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
An 879GOPS 243mW 80fps VGA Fully Visual CNN-SLAM Processor for Wide-Range Autonomous Exploration
Simultaneous localization and mapping (SLAM) estimates an agent’s trajectory for all six degrees of freedom (6 DoF) and constructs a 3D map of an unknown surrounding. It is a fundamental kernel that enables head-mounted augmented/virtual reality devices and autonomous navigation of micro aerial vehicles. A noticeable recent trend in visual SLAM is to apply computation- and memory-intensive convolutional neural networks (CNNs) that outperform traditional hand-designed feature-based methods [1]. For each video frame, CNN-extracted features are matched with stored keypoints to estimate the agent’s 6-DoF pose by solving a perspective-n-points (PnP) non-linear optimization problem (Fig. 7.3.1, left). The agent’s long-term trajectory over multiple frames is refined by a bundle adjustment process (BA, Fig. 7.3.1 right), which involves a large-scale ($\sim$120 variables) non-linear optimization. Visual SLAM requires massive computation ($\gt250$ GOP/s) in the CNN-based feature extraction and matching, as well as data-dependent dynamic memory access and control flow with high-precision operations, creating significant low-power design challenges. Software implementations are impractical, resulting in 0.2s runtime with a $\sim$3 GHz CPU + GPU system with $\gt100$ MB memory footprint and $\gt100$ W power consumption. Prior ASICs have implemented either an incomplete SLAM system [2, 3] that lacks estimation of ego-motion or employed a simplified (non-CNN) feature extraction and tracking [2, 4, 5] that limits SLAM quality and range. A recent ASIC [5] augments visual SLAM with an off-chip high-precision inertial measurement unit (IMU), simplifying the computational complexity, but incurring additional power and cost overhead.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信