用于大范围自主探索的879GOPS 243mW 80fps VGA全视觉CNN-SLAM处理器

2019 IEEE International Solid- State Circuits Conference - (ISSCC) Pub Date : 2019-02-01 DOI:10.1109/ISSCC.2019.8662397

Ziyun Li, Yu Chen, Luyao Gong, Lu Liu, D. Sylvester, D. Blaauw, Hun-Seok Kim

{"title":"用于大范围自主探索的879GOPS 243mW 80fps VGA全视觉CNN-SLAM处理器","authors":"Ziyun Li, Yu Chen, Luyao Gong, Lu Liu, D. Sylvester, D. Blaauw, Hun-Seok Kim","doi":"10.1109/ISSCC.2019.8662397","DOIUrl":null,"url":null,"abstract":"Simultaneous localization and mapping (SLAM) estimates an agent’s trajectory for all six degrees of freedom (6 DoF) and constructs a 3D map of an unknown surrounding. It is a fundamental kernel that enables head-mounted augmented/virtual reality devices and autonomous navigation of micro aerial vehicles. A noticeable recent trend in visual SLAM is to apply computation- and memory-intensive convolutional neural networks (CNNs) that outperform traditional hand-designed feature-based methods [1]. For each video frame, CNN-extracted features are matched with stored keypoints to estimate the agent’s 6-DoF pose by solving a perspective-n-points (PnP) non-linear optimization problem (Fig. 7.3.1, left). The agent’s long-term trajectory over multiple frames is refined by a bundle adjustment process (BA, Fig. 7.3.1 right), which involves a large-scale ($\\sim$120 variables) non-linear optimization. Visual SLAM requires massive computation ($\\gt250$ GOP/s) in the CNN-based feature extraction and matching, as well as data-dependent dynamic memory access and control flow with high-precision operations, creating significant low-power design challenges. Software implementations are impractical, resulting in 0.2s runtime with a $\\sim$3 GHz CPU + GPU system with $\\gt100$ MB memory footprint and $\\gt100$ W power consumption. Prior ASICs have implemented either an incomplete SLAM system [2, 3] that lacks estimation of ego-motion or employed a simplified (non-CNN) feature extraction and tracking [2, 4, 5] that limits SLAM quality and range. A recent ASIC [5] augments visual SLAM with an off-chip high-precision inertial measurement unit (IMU), simplifying the computational complexity, but incurring additional power and cost overhead.","PeriodicalId":265551,"journal":{"name":"2019 IEEE International Solid- State Circuits Conference - (ISSCC)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"35","resultStr":"{\"title\":\"An 879GOPS 243mW 80fps VGA Fully Visual CNN-SLAM Processor for Wide-Range Autonomous Exploration\",\"authors\":\"Ziyun Li, Yu Chen, Luyao Gong, Lu Liu, D. Sylvester, D. Blaauw, Hun-Seok Kim\",\"doi\":\"10.1109/ISSCC.2019.8662397\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Simultaneous localization and mapping (SLAM) estimates an agent’s trajectory for all six degrees of freedom (6 DoF) and constructs a 3D map of an unknown surrounding. It is a fundamental kernel that enables head-mounted augmented/virtual reality devices and autonomous navigation of micro aerial vehicles. A noticeable recent trend in visual SLAM is to apply computation- and memory-intensive convolutional neural networks (CNNs) that outperform traditional hand-designed feature-based methods [1]. For each video frame, CNN-extracted features are matched with stored keypoints to estimate the agent’s 6-DoF pose by solving a perspective-n-points (PnP) non-linear optimization problem (Fig. 7.3.1, left). The agent’s long-term trajectory over multiple frames is refined by a bundle adjustment process (BA, Fig. 7.3.1 right), which involves a large-scale ($\\\\sim$120 variables) non-linear optimization. Visual SLAM requires massive computation ($\\\\gt250$ GOP/s) in the CNN-based feature extraction and matching, as well as data-dependent dynamic memory access and control flow with high-precision operations, creating significant low-power design challenges. Software implementations are impractical, resulting in 0.2s runtime with a $\\\\sim$3 GHz CPU + GPU system with $\\\\gt100$ MB memory footprint and $\\\\gt100$ W power consumption. Prior ASICs have implemented either an incomplete SLAM system [2, 3] that lacks estimation of ego-motion or employed a simplified (non-CNN) feature extraction and tracking [2, 4, 5] that limits SLAM quality and range. A recent ASIC [5] augments visual SLAM with an off-chip high-precision inertial measurement unit (IMU), simplifying the computational complexity, but incurring additional power and cost overhead.\",\"PeriodicalId\":265551,\"journal\":{\"name\":\"2019 IEEE International Solid- State Circuits Conference - (ISSCC)\",\"volume\":\"42 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"35\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE International Solid- State Circuits Conference - (ISSCC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISSCC.2019.8662397\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Solid- State Circuits Conference - (ISSCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISSCC.2019.8662397","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 35

摘要

同时定位和映射(SLAM)估计代理的所有6个自由度(6 DoF)的轨迹，并构建未知环境的3D地图。它是实现头戴式增强/虚拟现实设备和微型飞行器自主导航的基本内核。视觉SLAM的一个值得注意的最新趋势是应用计算和内存密集型卷积神经网络(cnn)，其优于传统的手工设计的基于特征的方法[1]。对于每个视频帧，cnn提取的特征与存储的关键点相匹配，通过求解一个视角-n点(PnP)非线性优化问题来估计智能体的6自由度姿态(图7.3.1，左)。智能体在多个帧上的长期轨迹通过束调整过程(BA，图7.3.1右)进行细化，该过程涉及大规模($\sim$120变量)非线性优化。Visual SLAM在基于cnn的特征提取和匹配中需要大量的计算($\gt250$ GOP/s)，以及基于数据的动态内存访问和高精度操作的控制流，这给低功耗设计带来了重大挑战。软件实现不切实际，导致运行时间为0.2s, CPU + GPU系统为$ $ sim$ 3ghz，内存占用$ $ gt100$ MB，功耗$ $ gt100$ W。先前的asic要么实现了不完整的SLAM系统[2,3]，缺乏对自我运动的估计，要么采用了简化的(非cnn)特征提取和跟踪[2,4,5]，限制了SLAM的质量和范围。最近的ASIC[5]通过片外高精度惯性测量单元(IMU)增强了视觉SLAM，简化了计算复杂度，但产生了额外的功率和成本开销。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

An 879GOPS 243mW 80fps VGA Fully Visual CNN-SLAM Processor for Wide-Range Autonomous Exploration

Simultaneous localization and mapping (SLAM) estimates an agent’s trajectory for all six degrees of freedom (6 DoF) and constructs a 3D map of an unknown surrounding. It is a fundamental kernel that enables head-mounted augmented/virtual reality devices and autonomous navigation of micro aerial vehicles. A noticeable recent trend in visual SLAM is to apply computation- and memory-intensive convolutional neural networks (CNNs) that outperform traditional hand-designed feature-based methods [1]. For each video frame, CNN-extracted features are matched with stored keypoints to estimate the agent’s 6-DoF pose by solving a perspective-n-points (PnP) non-linear optimization problem (Fig. 7.3.1, left). The agent’s long-term trajectory over multiple frames is refined by a bundle adjustment process (BA, Fig. 7.3.1 right), which involves a large-scale ($\sim$120 variables) non-linear optimization. Visual SLAM requires massive computation ($\gt250$ GOP/s) in the CNN-based feature extraction and matching, as well as data-dependent dynamic memory access and control flow with high-precision operations, creating significant low-power design challenges. Software implementations are impractical, resulting in 0.2s runtime with a $\sim$3 GHz CPU + GPU system with $\gt100$ MB memory footprint and $\gt100$ W power consumption. Prior ASICs have implemented either an incomplete SLAM system [2, 3] that lacks estimation of ego-motion or employed a simplified (non-CNN) feature extraction and tracking [2, 4, 5] that limits SLAM quality and range. A recent ASIC [5] augments visual SLAM with an off-chip high-precision inertial measurement unit (IMU), simplifying the computational complexity, but incurring additional power and cost overhead.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 IEEE International Solid- State Circuits Conference - (ISSCC)

自引率

0.00%

发文量