Multi-level texture caching for 3D graphics hardware

Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235) Pub Date : 1998-04-16 DOI:10.1109/ISCA.1998.694765

M. Cox, Narendra Bhandari, M. Shantz

{"title":"Multi-level texture caching for 3D graphics hardware","authors":"M. Cox, Narendra Bhandari, M. Shantz","doi":"10.1109/ISCA.1998.694765","DOIUrl":null,"url":null,"abstract":"Traditional graphics hardware architectures implement what we call the push architecture for texture mapping. Local memory is dedicated to the accelerator for fast local retrieval of texture during rasterization, and the application is responsible for managing this memory. The push architecture has a bandwidth advantage, but disadvantages of limited texture capacity, escalation of accelerator memory requirements (and therefore cost), and poor memory utilization. The push architecture also requires the programmer to solve the bin-packing problem of managing accelerator memory each frame. More recently graphics hardware on PC-class machines has moved to an implementation of what we call the pull architecture. Texture is stored in system memory and downloaded by the accelerator as needed. The pull architecture has advantages of texture capacity, stems the escalation of accelerator memory requirements, and has good memory utilization. It also frees the programmer from accelerator texture memory management. However, the pull architecture suffers escalating requirements for bandwidth from main memory to the accelerator. In this paper we propose multi-level texture caching to provide the accelerator with the bandwidth advantages of the push architecture combined with the capacity advantages of the pull architecture. We have studied the feasibility of 2-level caching and found the following: (1) significant re-use of texture between frames; (2) L2 caching requires significantly less memory than the push architecture; (3) L2 caching requires significantly less bandwidth from host memory than the pull architecture; (4) L2 caching enables implementation of smaller L1 caches that would otherwise bandwidth-limit accelerators on the workloads in this paper. Results suggest that an L2 cache achieves the original advantage of the pull architecture stemming the growth of local texture memory - while at the same time stemming the current explosion in demand for texture bandwidth between host memory and the accelerator.","PeriodicalId":393075,"journal":{"name":"Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1998-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"57","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCA.1998.694765","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 57

Abstract

Traditional graphics hardware architectures implement what we call the push architecture for texture mapping. Local memory is dedicated to the accelerator for fast local retrieval of texture during rasterization, and the application is responsible for managing this memory. The push architecture has a bandwidth advantage, but disadvantages of limited texture capacity, escalation of accelerator memory requirements (and therefore cost), and poor memory utilization. The push architecture also requires the programmer to solve the bin-packing problem of managing accelerator memory each frame. More recently graphics hardware on PC-class machines has moved to an implementation of what we call the pull architecture. Texture is stored in system memory and downloaded by the accelerator as needed. The pull architecture has advantages of texture capacity, stems the escalation of accelerator memory requirements, and has good memory utilization. It also frees the programmer from accelerator texture memory management. However, the pull architecture suffers escalating requirements for bandwidth from main memory to the accelerator. In this paper we propose multi-level texture caching to provide the accelerator with the bandwidth advantages of the push architecture combined with the capacity advantages of the pull architecture. We have studied the feasibility of 2-level caching and found the following: (1) significant re-use of texture between frames; (2) L2 caching requires significantly less memory than the push architecture; (3) L2 caching requires significantly less bandwidth from host memory than the pull architecture; (4) L2 caching enables implementation of smaller L1 caches that would otherwise bandwidth-limit accelerators on the workloads in this paper. Results suggest that an L2 cache achieves the original advantage of the pull architecture stemming the growth of local texture memory - while at the same time stemming the current explosion in demand for texture bandwidth between host memory and the accelerator.

查看原文本刊更多论文

用于3D图形硬件的多级纹理缓存

传统的图形硬件架构实现了我们所说的纹理映射的推送架构。本地内存专用于加速器，用于在光栅化过程中快速本地检索纹理，应用程序负责管理该内存。push架构具有带宽优势，但缺点是纹理容量有限、加速器内存需求增加(因此成本增加)以及内存利用率低下。推送架构还要求程序员解决管理每帧加速器内存的装箱问题。最近，pc级机器上的图形硬件已经转向了我们所说的拉架构的实现。纹理存储在系统内存中，并在需要时由加速器下载。pull架构具有纹理容量的优势，抑制了加速器内存需求的升级，具有良好的内存利用率。它还将程序员从加速器纹理内存管理中解放出来。但是，从主存储器到加速器的带宽需求不断增加，因此拉式架构的带宽需求不断增加。在本文中，我们提出了多级纹理缓存，为加速器提供了推架构的带宽优势和拉架构的容量优势。我们研究了二级缓存的可行性，发现:(1)帧间纹理的显著重用;(2) L2缓存比push架构需要更少的内存;(3) L2缓存需要的主机内存带宽明显少于pull架构;(4) L2缓存允许实现较小的L1缓存，否则将在本文的工作负载上限制带宽的加速器。结果表明，L2缓存实现了pull架构的原始优势，抑制了本地纹理内存的增长，同时抑制了当前主机内存和加速器之间对纹理带宽的需求激增。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235)

自引率

0.00%

发文量