在PC平台上实现高质量、高性能3D图形的未解决问题和机会

Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware Pub Date : 1998-08-01 DOI:10.1145/285305.285306

David B. Kirk

{"title":"在PC平台上实现高质量、高性能3D图形的未解决问题和机会","authors":"David B. Kirk","doi":"10.1145/285305.285306","DOIUrl":null,"url":null,"abstract":"In the late 1990’s, graphics hardware is experiencing a dramatic board-to-chip integration reminiscent to the minicomputer-to-microprocessor revolution of the 1980’s. Today, mass-market PCs are beginning to match the 3D polygon and pixel rendering of a 1992 Silicon Graphics Reality EngineTM system. The extreme pace of technology evolution in the PC market is such that within 1 or 2 years the performance of a mainstream PC will be very close to the highest performance 3D workstations. At that time, the quality and performance demands will dictate serious changes in PC architecture as well as changes in rendering pipeline and algorithms. This paper will discuss several potential areas of change. A GENERAL PROBLEM STATEMENT The biggest focus of 3D graphics applications on the PC is interactive entertainment, or games. This workload is extremely dynamic, with continuous updating of geometry, textures, animation, lighting, and shading. Although in other applications such as Computer-AidedDesign (CAD), models may be static and retained mode or display list APIs may be used, it is common in games that geometry and textures change regularly. A good operating assumption is that everything changes every frame. The assumption of pervasive change puts a large burden on both the bandwidth and calculation capabilities of the graphics pipeline. GEOMETRY AND PIXEL THROUGHPUT As a baseline, we’ll start with some data and cycle counting of a reasonable workload for an interactive application. PC graphics hardware is capable of this throughput. As an example, this is a bandwidth analysis of a 400 MHz Intel Pentium IITM PC with an Nvidia RNA TNTTM graphics processor. This analysis does not derive from a specific application, but is simply a counting exercise. Many applications push one or more of these limits, but few programs stress all axes. For a typical application to achieve 1M triangles/second, 1 OOM 32bit pixels/second, 2 textures/pixel requires: 1 M triangles * 3 vertices/triangle * 32 bytes/vertex = 100 MB; triangle data crosses the bus 3-5 times (read, transform and written by the CPU, and read by the graphics processor, so simply copying triangle data requires 300-500 MB/second on the PC buses. 1OOM pixels * 8 bytes/pixel (32bit RGBA, 32bit Z/stencil) = 800 MB; with 50% overhead for RMW requires 1.2 GB/second 2 textures/pixel * 4 texelsltexture * 2 bytee a texture cache can create up to 4X reuse efficiency, so requires 400 MB/second Assumptions here include: 32-byte vertices are Direct3DTM TLVertices (X,Y,Z,R,G,B,A,F,SR,SG,SB,W) triangle setup is done on the graphics processor bilinear texture filtering 16bit texels are RSG6B5 50% of pixels written after Zbuffer read/compare Transferring triangle vertex data to the graphics processor from the CPU is commonly the bottleneck. This is different from typical workstations or the PCs of just 1 year ago, when transform and lighting calculation, fill rate, or texture rate were limiting factors. GEOMETRY REPRESENTATION As pixel shading, texturing, and fill rates rise, the most constrained bottleneck in the system will increasingly become creation and transfer of geometry information. The data required to represent a triangle comprises the bulk of system bus traffic in an aggressive 3D application. As","PeriodicalId":298241,"journal":{"name":"Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware","volume":"389 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1998-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"23","resultStr":"{\"title\":\"Unsolved problems and opportunities for high-quality, high-performance 3D graphics on a PC platform\",\"authors\":\"David B. Kirk\",\"doi\":\"10.1145/285305.285306\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the late 1990’s, graphics hardware is experiencing a dramatic board-to-chip integration reminiscent to the minicomputer-to-microprocessor revolution of the 1980’s. Today, mass-market PCs are beginning to match the 3D polygon and pixel rendering of a 1992 Silicon Graphics Reality EngineTM system. The extreme pace of technology evolution in the PC market is such that within 1 or 2 years the performance of a mainstream PC will be very close to the highest performance 3D workstations. At that time, the quality and performance demands will dictate serious changes in PC architecture as well as changes in rendering pipeline and algorithms. This paper will discuss several potential areas of change. A GENERAL PROBLEM STATEMENT The biggest focus of 3D graphics applications on the PC is interactive entertainment, or games. This workload is extremely dynamic, with continuous updating of geometry, textures, animation, lighting, and shading. Although in other applications such as Computer-AidedDesign (CAD), models may be static and retained mode or display list APIs may be used, it is common in games that geometry and textures change regularly. A good operating assumption is that everything changes every frame. The assumption of pervasive change puts a large burden on both the bandwidth and calculation capabilities of the graphics pipeline. GEOMETRY AND PIXEL THROUGHPUT As a baseline, we’ll start with some data and cycle counting of a reasonable workload for an interactive application. PC graphics hardware is capable of this throughput. As an example, this is a bandwidth analysis of a 400 MHz Intel Pentium IITM PC with an Nvidia RNA TNTTM graphics processor. This analysis does not derive from a specific application, but is simply a counting exercise. Many applications push one or more of these limits, but few programs stress all axes. For a typical application to achieve 1M triangles/second, 1 OOM 32bit pixels/second, 2 textures/pixel requires: 1 M triangles * 3 vertices/triangle * 32 bytes/vertex = 100 MB; triangle data crosses the bus 3-5 times (read, transform and written by the CPU, and read by the graphics processor, so simply copying triangle data requires 300-500 MB/second on the PC buses. 1OOM pixels * 8 bytes/pixel (32bit RGBA, 32bit Z/stencil) = 800 MB; with 50% overhead for RMW requires 1.2 GB/second 2 textures/pixel * 4 texelsltexture * 2 bytee a texture cache can create up to 4X reuse efficiency, so requires 400 MB/second Assumptions here include: 32-byte vertices are Direct3DTM TLVertices (X,Y,Z,R,G,B,A,F,SR,SG,SB,W) triangle setup is done on the graphics processor bilinear texture filtering 16bit texels are RSG6B5 50% of pixels written after Zbuffer read/compare Transferring triangle vertex data to the graphics processor from the CPU is commonly the bottleneck. This is different from typical workstations or the PCs of just 1 year ago, when transform and lighting calculation, fill rate, or texture rate were limiting factors. GEOMETRY REPRESENTATION As pixel shading, texturing, and fill rates rise, the most constrained bottleneck in the system will increasingly become creation and transfer of geometry information. The data required to represent a triangle comprises the bulk of system bus traffic in an aggressive 3D application. As\",\"PeriodicalId\":298241,\"journal\":{\"name\":\"Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware\",\"volume\":\"389 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1998-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"23\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/285305.285306\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/285305.285306","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 23

摘要

在20世纪90年代末，图形硬件正在经历一个戏剧性的板到芯片的集成，让人想起20世纪80年代的微型计算机到微处理器的革命。今天，大众市场的个人电脑开始匹配1992年Silicon Graphics Reality engine系统的3D多边形和像素渲染。PC市场技术发展的极端速度是这样的，在1到2年内，主流PC的性能将非常接近最高性能的3D工作站。届时，质量和性能需求将导致PC架构的重大变化，以及渲染管道和算法的变化。本文将讨论几个潜在的变化领域。PC上的3D图像应用程序的最大焦点是互动娱乐或游戏。这个工作量是非常动态的，不断更新几何，纹理，动画，照明和阴影。尽管在计算机辅助设计(CAD)等其他应用程序中，模型可能是静态的，并可能使用保留模式或显示列表api，但在游戏中，几何和纹理经常发生变化是很常见的。一个好的操作假设是每一帧所有的东西都会改变。普遍变化的假设给图形管道的带宽和计算能力带来了很大的负担。几何和像素吞吐量作为基线，我们将从交互式应用程序的合理工作负载的一些数据和周期计数开始。PC图形硬件能够实现这种吞吐量。作为一个例子，这是一个带宽分析400mhz英特尔奔腾IITM PC与Nvidia RNA TNTTM图形处理器。这种分析不是来自于特定的应用程序，而仅仅是一种计数练习。许多应用程序都突破了这些限制中的一个或多个，但很少有程序强调所有的轴。对于一个典型的应用程序，要实现1M三角形/秒，1 OOM 32bit像素/秒，2个纹理/像素需要:1M三角形* 3个顶点/三角形* 32字节/顶点= 100 MB;三角形数据要在总线上穿越3-5次(CPU读取、转换和写入，图形处理器读取)，因此在PC总线上简单地复制三角形数据需要300- 500mb /秒。1OOM像素* 8字节/像素(32位RGBA, 32位Z/模板)= 800 MB;50%的RMW开销需要1.2 GB/秒2个纹理/像素* 4个texelsltexture * 2字节一个纹理缓存可以创造高达4倍的重用效率，所以需要400 MB/秒32字节的顶点是Direct3DTM的tl顶点(X,Y,Z,R,G,B,A,F,SR,SG,SB,W)三角形的设置是在图形处理器上完成的，双线性纹理过滤16位的顶点是RSG6B5，在Zbuffer读取/比较后写入50%的像素，将三角形顶点数据从CPU传输到图形处理器通常是瓶颈。这与一年前的典型工作站或pc不同，当时变换和照明计算、填充率或纹理率是限制因素。随着像素着色、纹理和填充率的提高，系统中最受约束的瓶颈将日益成为几何信息的创建和传输。在一个激进的3D应用程序中，表示三角形所需的数据包含了大量的系统总线流量。作为

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Unsolved problems and opportunities for high-quality, high-performance 3D graphics on a PC platform

In the late 1990’s, graphics hardware is experiencing a dramatic board-to-chip integration reminiscent to the minicomputer-to-microprocessor revolution of the 1980’s. Today, mass-market PCs are beginning to match the 3D polygon and pixel rendering of a 1992 Silicon Graphics Reality EngineTM system. The extreme pace of technology evolution in the PC market is such that within 1 or 2 years the performance of a mainstream PC will be very close to the highest performance 3D workstations. At that time, the quality and performance demands will dictate serious changes in PC architecture as well as changes in rendering pipeline and algorithms. This paper will discuss several potential areas of change. A GENERAL PROBLEM STATEMENT The biggest focus of 3D graphics applications on the PC is interactive entertainment, or games. This workload is extremely dynamic, with continuous updating of geometry, textures, animation, lighting, and shading. Although in other applications such as Computer-AidedDesign (CAD), models may be static and retained mode or display list APIs may be used, it is common in games that geometry and textures change regularly. A good operating assumption is that everything changes every frame. The assumption of pervasive change puts a large burden on both the bandwidth and calculation capabilities of the graphics pipeline. GEOMETRY AND PIXEL THROUGHPUT As a baseline, we’ll start with some data and cycle counting of a reasonable workload for an interactive application. PC graphics hardware is capable of this throughput. As an example, this is a bandwidth analysis of a 400 MHz Intel Pentium IITM PC with an Nvidia RNA TNTTM graphics processor. This analysis does not derive from a specific application, but is simply a counting exercise. Many applications push one or more of these limits, but few programs stress all axes. For a typical application to achieve 1M triangles/second, 1 OOM 32bit pixels/second, 2 textures/pixel requires: 1 M triangles * 3 vertices/triangle * 32 bytes/vertex = 100 MB; triangle data crosses the bus 3-5 times (read, transform and written by the CPU, and read by the graphics processor, so simply copying triangle data requires 300-500 MB/second on the PC buses. 1OOM pixels * 8 bytes/pixel (32bit RGBA, 32bit Z/stencil) = 800 MB; with 50% overhead for RMW requires 1.2 GB/second 2 textures/pixel * 4 texelsltexture * 2 bytee a texture cache can create up to 4X reuse efficiency, so requires 400 MB/second Assumptions here include: 32-byte vertices are Direct3DTM TLVertices (X,Y,Z,R,G,B,A,F,SR,SG,SB,W) triangle setup is done on the graphics processor bilinear texture filtering 16bit texels are RSG6B5 50% of pixels written after Zbuffer read/compare Transferring triangle vertex data to the graphics processor from the CPU is commonly the bottleneck. This is different from typical workstations or the PCs of just 1 year ago, when transform and lighting calculation, fill rate, or texture rate were limiting factors. GEOMETRY REPRESENTATION As pixel shading, texturing, and fill rates rise, the most constrained bottleneck in the system will increasingly become creation and transfer of geometry information. The data required to represent a triangle comprises the bulk of system bus traffic in an aggressive 3D application. As

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware

自引率

0.00%

发文量