UltraMeshRenderer：用于巨大3D网格实时渲染的GPU外核内存的高效结构和管理

IF 9.5 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Graphics Pub Date : 2025-07-27 DOI:10.1145/3731186

Huadong Zhang, Lizhou Cao, Chao Peng

{"title":"UltraMeshRenderer：用于巨大3D网格实时渲染的GPU外核内存的高效结构和管理","authors":"Huadong Zhang, Lizhou Cao, Chao Peng","doi":"10.1145/3731186","DOIUrl":null,"url":null,"abstract":"GPUs can encounter memory capacity constraints, which pose challenges for achieving real-time rendering performance when processing large 3D models that exceed available memory. State-of-the-art out-of-core rendering frameworks have leveraged Level of Detail (LOD) and frame-to-frame coherence data management techniques to optimize memory usage and minimize CPU-to-GPU data transfer costs. However, the size of view-dependently selected data may still exceed GPU memory capacity, and data transfer remains the most significant bottleneck in overall performance costs. To address these, we introduce a new GPU out-of-core rendering approach that includes a LOD selection method that takes into account both memory and coherence constraints and a parallel in-place GPU memory management algorithm that efficiently assembles the data of the current frame with GPU-resident data from the previous frame and transferred data. Our approach bounds memory usage and data transfer costs, prioritizes and schedules the transfer of essential data, incrementally refining the LOD over subsequent frames to converge toward the desired visual fidelity. Our parallel memory management algorithm consolidates frame-different and reusable data, dynamically reallocating GPU memory slots for efficient in-place operations. Hierarchical LOD representations remain a core component, and we emphasize their role in supporting adaptive data transfer and coherence management, characterized by a uniform depth and near-equal patch size at all levels. Our approach adapts seamlessly to scenarios with varying levels of coherence by balancing real-time performance with visual consistency. Experimental results demonstrate that our system achieves significant performance improvements, rendering scenes with billions of triangles in real-time, outperforming existing methods while maintaining consistent visual quality during dynamic interactions.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"22 1","pages":""},"PeriodicalIF":9.5000,"publicationDate":"2025-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"UltraMeshRenderer: Efficient Structure and Management of GPU Out-of-core Memory for Real-time Rendering of Gigantic 3D Meshes\",\"authors\":\"Huadong Zhang, Lizhou Cao, Chao Peng\",\"doi\":\"10.1145/3731186\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"GPUs can encounter memory capacity constraints, which pose challenges for achieving real-time rendering performance when processing large 3D models that exceed available memory. State-of-the-art out-of-core rendering frameworks have leveraged Level of Detail (LOD) and frame-to-frame coherence data management techniques to optimize memory usage and minimize CPU-to-GPU data transfer costs. However, the size of view-dependently selected data may still exceed GPU memory capacity, and data transfer remains the most significant bottleneck in overall performance costs. To address these, we introduce a new GPU out-of-core rendering approach that includes a LOD selection method that takes into account both memory and coherence constraints and a parallel in-place GPU memory management algorithm that efficiently assembles the data of the current frame with GPU-resident data from the previous frame and transferred data. Our approach bounds memory usage and data transfer costs, prioritizes and schedules the transfer of essential data, incrementally refining the LOD over subsequent frames to converge toward the desired visual fidelity. Our parallel memory management algorithm consolidates frame-different and reusable data, dynamically reallocating GPU memory slots for efficient in-place operations. Hierarchical LOD representations remain a core component, and we emphasize their role in supporting adaptive data transfer and coherence management, characterized by a uniform depth and near-equal patch size at all levels. Our approach adapts seamlessly to scenarios with varying levels of coherence by balancing real-time performance with visual consistency. Experimental results demonstrate that our system achieves significant performance improvements, rendering scenes with billions of triangles in real-time, outperforming existing methods while maintaining consistent visual quality during dynamic interactions.\",\"PeriodicalId\":50913,\"journal\":{\"name\":\"ACM Transactions on Graphics\",\"volume\":\"22 1\",\"pages\":\"\"},\"PeriodicalIF\":9.5000,\"publicationDate\":\"2025-07-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Graphics\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/3731186\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Graphics","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3731186","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

摘要

gpu可能会遇到内存容量限制，这对处理超过可用内存的大型3D模型时实现实时渲染性能构成了挑战。最先进的核心外渲染框架利用了细节级别（LOD）和帧对帧一致性数据管理技术来优化内存使用并最小化cpu到gpu的数据传输成本。然而，与视图相关的选择数据的大小可能仍然超过GPU内存容量，并且数据传输仍然是总体性能成本中最重要的瓶颈。为了解决这些问题，我们引入了一种新的GPU离核渲染方法，其中包括一种考虑内存和一致性约束的LOD选择方法，以及一种并行的GPU内存管理算法，该算法有效地将当前帧的数据与来自前一帧的GPU驻留数据和传输数据组装在一起。我们的方法限制了内存使用和数据传输成本，对基本数据的传输进行优先级排序和调度，在随后的帧中逐步改进LOD，以收敛到所需的视觉保真度。我们的并行内存管理算法整合了不同帧和可重用的数据，动态重新分配GPU内存插槽，以实现高效的就地操作。分层LOD表示仍然是一个核心组件，我们强调它们在支持自适应数据传输和一致性管理方面的作用，其特点是在所有级别上具有统一的深度和接近相等的补丁大小。我们的方法通过平衡实时性能和视觉一致性，无缝地适应不同连贯程度的场景。实验结果表明，我们的系统实现了显着的性能改进，实时渲染数十亿三角形的场景，优于现有方法，同时在动态交互过程中保持一致的视觉质量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

UltraMeshRenderer: Efficient Structure and Management of GPU Out-of-core Memory for Real-time Rendering of Gigantic 3D Meshes

GPUs can encounter memory capacity constraints, which pose challenges for achieving real-time rendering performance when processing large 3D models that exceed available memory. State-of-the-art out-of-core rendering frameworks have leveraged Level of Detail (LOD) and frame-to-frame coherence data management techniques to optimize memory usage and minimize CPU-to-GPU data transfer costs. However, the size of view-dependently selected data may still exceed GPU memory capacity, and data transfer remains the most significant bottleneck in overall performance costs. To address these, we introduce a new GPU out-of-core rendering approach that includes a LOD selection method that takes into account both memory and coherence constraints and a parallel in-place GPU memory management algorithm that efficiently assembles the data of the current frame with GPU-resident data from the previous frame and transferred data. Our approach bounds memory usage and data transfer costs, prioritizes and schedules the transfer of essential data, incrementally refining the LOD over subsequent frames to converge toward the desired visual fidelity. Our parallel memory management algorithm consolidates frame-different and reusable data, dynamically reallocating GPU memory slots for efficient in-place operations. Hierarchical LOD representations remain a core component, and we emphasize their role in supporting adaptive data transfer and coherence management, characterized by a uniform depth and near-equal patch size at all levels. Our approach adapts seamlessly to scenarios with varying levels of coherence by balancing real-time performance with visual consistency. Experimental results demonstrate that our system achieves significant performance improvements, rendering scenes with billions of triangles in real-time, outperforming existing methods while maintaining consistent visual quality during dynamic interactions.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Transactions on Graphics 工程技术-计算机：软件工程

CiteScore

14.30

自引率

25.80%

发文量

193

审稿时长

12 months

期刊介绍： ACM Transactions on Graphics (TOG) is a peer-reviewed scientific journal that aims to disseminate the latest findings of note in the field of computer graphics. It has been published since 1982 by the Association for Computing Machinery. Starting in 2003, all papers accepted for presentation at the annual SIGGRAPH conference are printed in a special summer issue of the journal.