NeRF-PIM：神经渲染网络的 PIM 硬件-软件协同设计

IF 2.7 3区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2024-11-06 DOI:10.1109/TCAD.2024.3443712

Jaeyoung Heo;Sungjoo Yoo

{"title":"NeRF-PIM：神经渲染网络的 PIM 硬件-软件协同设计","authors":"Jaeyoung Heo;Sungjoo Yoo","doi":"10.1109/TCAD.2024.3443712","DOIUrl":null,"url":null,"abstract":"Neural radiance field (NeRF) has emerged as a state-of-the-art technique, offering unprecedented realism in rendering. Despite its advancements, the adoption of NeRF is constrained by high computational cost, leading to slow rendering speed. Voxel-based optimization of NeRF addresses this by reducing the computational cost, but it introduces substantial memory overheads. To address this problem, we propose NeRF-PIM, a hardware-software co-design approach. In order to address the problem of the memory accesses to the large model (of the voxel grid) with poor locality and low compute density, we propose exploiting processing-in-memory (PIM) together with PIM-aware software optimizations in terms of the data layout, redundancy removal, and computation reuse. Our PIM hardware aims to accelerate the trilinear interpolation and dot product operations. Specifically, to address the low utilization of internal bandwidth due to the random accesses to the voxels, we propose a data layout that judiciously exploits the characteristics of the interpolation operation on the voxel grid, which helps remove bank conflicts in voxel accesses and also improves the efficiency of PIM command issue by exploiting the all-bank mode in the existing PIM device. As PIM-aware software optimizations, we also propose occupancy-grid-aware pruning and one-voxel two-sampling (1V2S) methods, which contribute to compute the efficiency improvement (by avoiding the redundant computation on the empty space) and memory traffic reduction (by reusing the per-voxel dot product results). We conduct experiments using an actual baseline HBM-PIM device. Our NeRF-PIM demonstrates a speedup of 7.4 and \n<inline-formula> <tex-math>$5.0\\times $ </tex-math></inline-formula>\n compared to the baseline on the two datasets, Synthetic-NeRF and Tanks and Temples, respectively.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"3900-3912"},"PeriodicalIF":2.7000,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"NeRF-PIM: PIM Hardware-Software Co-Design of Neural Rendering Networks\",\"authors\":\"Jaeyoung Heo;Sungjoo Yoo\",\"doi\":\"10.1109/TCAD.2024.3443712\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Neural radiance field (NeRF) has emerged as a state-of-the-art technique, offering unprecedented realism in rendering. Despite its advancements, the adoption of NeRF is constrained by high computational cost, leading to slow rendering speed. Voxel-based optimization of NeRF addresses this by reducing the computational cost, but it introduces substantial memory overheads. To address this problem, we propose NeRF-PIM, a hardware-software co-design approach. In order to address the problem of the memory accesses to the large model (of the voxel grid) with poor locality and low compute density, we propose exploiting processing-in-memory (PIM) together with PIM-aware software optimizations in terms of the data layout, redundancy removal, and computation reuse. Our PIM hardware aims to accelerate the trilinear interpolation and dot product operations. Specifically, to address the low utilization of internal bandwidth due to the random accesses to the voxels, we propose a data layout that judiciously exploits the characteristics of the interpolation operation on the voxel grid, which helps remove bank conflicts in voxel accesses and also improves the efficiency of PIM command issue by exploiting the all-bank mode in the existing PIM device. As PIM-aware software optimizations, we also propose occupancy-grid-aware pruning and one-voxel two-sampling (1V2S) methods, which contribute to compute the efficiency improvement (by avoiding the redundant computation on the empty space) and memory traffic reduction (by reusing the per-voxel dot product results). We conduct experiments using an actual baseline HBM-PIM device. Our NeRF-PIM demonstrates a speedup of 7.4 and \\n<inline-formula> <tex-math>$5.0\\\\times $ </tex-math></inline-formula>\\n compared to the baseline on the two datasets, Synthetic-NeRF and Tanks and Temples, respectively.\",\"PeriodicalId\":13251,\"journal\":{\"name\":\"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems\",\"volume\":\"43 11\",\"pages\":\"3900-3912\"},\"PeriodicalIF\":2.7000,\"publicationDate\":\"2024-11-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10745790/\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10745790/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

摘要

神经辐射场（NeRF）已成为一种最先进的技术，可提供前所未有的逼真渲染效果。尽管 NeRF 技术不断进步，但其应用仍受到高计算成本的限制，导致渲染速度缓慢。基于体素的 NeRF 优化通过降低计算成本解决了这一问题，但却带来了大量内存开销。为解决这一问题，我们提出了软硬件协同设计方法 NeRF-PIM。为了解决大型模型（体素网格）的内存访问定位性差和计算密度低的问题，我们提出利用内存处理（PIM），并在数据布局、冗余消除和计算重用方面进行 PIM 感知软件优化。我们的 PIM 硬件旨在加速三线性插值和点乘操作。具体来说，为了解决随机存取体素导致的内部带宽利用率低的问题，我们提出了一种数据布局，该布局可明智地利用体素网格上插值操作的特性，有助于消除体素存取中的库冲突，还可利用现有 PIM 设备中的全库模式提高 PIM 命令发布的效率。作为 PIM 感知软件优化，我们还提出了占用网格感知修剪和单体素双采样（1V2S）方法，这有助于计算效率的提高（通过避免对空白空间的冗余计算）和内存流量的减少（通过重复使用每个体素点乘结果）。我们使用实际的基准 HBM-PIM 设备进行了实验。与基线相比，我们的 NeRF-PIM 在 Synthetic-NeRF 和 Tanks and Temples 这两个数据集上的速度分别提高了 7.4 倍和 5.0 倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

NeRF-PIM: PIM Hardware-Software Co-Design of Neural Rendering Networks

Neural radiance field (NeRF) has emerged as a state-of-the-art technique, offering unprecedented realism in rendering. Despite its advancements, the adoption of NeRF is constrained by high computational cost, leading to slow rendering speed. Voxel-based optimization of NeRF addresses this by reducing the computational cost, but it introduces substantial memory overheads. To address this problem, we propose NeRF-PIM, a hardware-software co-design approach. In order to address the problem of the memory accesses to the large model (of the voxel grid) with poor locality and low compute density, we propose exploiting processing-in-memory (PIM) together with PIM-aware software optimizations in terms of the data layout, redundancy removal, and computation reuse. Our PIM hardware aims to accelerate the trilinear interpolation and dot product operations. Specifically, to address the low utilization of internal bandwidth due to the random accesses to the voxels, we propose a data layout that judiciously exploits the characteristics of the interpolation operation on the voxel grid, which helps remove bank conflicts in voxel accesses and also improves the efficiency of PIM command issue by exploiting the all-bank mode in the existing PIM device. As PIM-aware software optimizations, we also propose occupancy-grid-aware pruning and one-voxel two-sampling (1V2S) methods, which contribute to compute the efficiency improvement (by avoiding the redundant computation on the empty space) and memory traffic reduction (by reusing the per-voxel dot product results). We conduct experiments using an actual baseline HBM-PIM device. Our NeRF-PIM demonstrates a speedup of 7.4 and

$5.0\times $

compared to the baseline on the two datasets, Synthetic-NeRF and Tanks and Temples, respectively.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 工程技术-工程：电子与电气

CiteScore

5.60

自引率

13.80%

发文量

500

审稿时长

7 months

期刊介绍： The purpose of this Transactions is to publish papers of interest to individuals in the area of computer-aided design of integrated circuits and systems composed of analog, digital, mixed-signal, optical, or microwave components. The aids include methods, models, algorithms, and man-machine interfaces for system-level, physical and logical design including: planning, synthesis, partitioning, modeling, simulation, layout, verification, testing, hardware-software co-design and documentation of integrated circuit and system designs of all complexities. Design tools and techniques for evaluating and designing integrated circuits and systems for metrics such as performance, power, reliability, testability, and security are a focus.