{"title":"在PC平台上实现高质量、高性能3D图形的未解决问题和机会","authors":"David B. Kirk","doi":"10.1145/285305.285306","DOIUrl":null,"url":null,"abstract":"In the late 1990’s, graphics hardware is experiencing a dramatic board-to-chip integration reminiscent to the minicomputer-to-microprocessor revolution of the 1980’s. Today, mass-market PCs are beginning to match the 3D polygon and pixel rendering of a 1992 Silicon Graphics Reality EngineTM system. The extreme pace of technology evolution in the PC market is such that within 1 or 2 years the performance of a mainstream PC will be very close to the highest performance 3D workstations. At that time, the quality and performance demands will dictate serious changes in PC architecture as well as changes in rendering pipeline and algorithms. This paper will discuss several potential areas of change. A GENERAL PROBLEM STATEMENT The biggest focus of 3D graphics applications on the PC is interactive entertainment, or games. This workload is extremely dynamic, with continuous updating of geometry, textures, animation, lighting, and shading. Although in other applications such as Computer-AidedDesign (CAD), models may be static and retained mode or display list APIs may be used, it is common in games that geometry and textures change regularly. A good operating assumption is that everything changes every frame. The assumption of pervasive change puts a large burden on both the bandwidth and calculation capabilities of the graphics pipeline. GEOMETRY AND PIXEL THROUGHPUT As a baseline, we’ll start with some data and cycle counting of a reasonable workload for an interactive application. PC graphics hardware is capable of this throughput. As an example, this is a bandwidth analysis of a 400 MHz Intel Pentium IITM PC with an Nvidia RNA TNTTM graphics processor. This analysis does not derive from a specific application, but is simply a counting exercise. Many applications push one or more of these limits, but few programs stress all axes. For a typical application to achieve 1M triangles/second, 1 OOM 32bit pixels/second, 2 textures/pixel requires: 1 M triangles * 3 vertices/triangle * 32 bytes/vertex = 100 MB; triangle data crosses the bus 3-5 times (read, transform and written by the CPU, and read by the graphics processor, so simply copying triangle data requires 300-500 MB/second on the PC buses. 1OOM pixels * 8 bytes/pixel (32bit RGBA, 32bit Z/stencil) = 800 MB; with 50% overhead for RMW requires 1.2 GB/second 2 textures/pixel * 4 texelsltexture * 2 bytee a texture cache can create up to 4X reuse efficiency, so requires 400 MB/second Assumptions here include: 32-byte vertices are Direct3DTM TLVertices (X,Y,Z,R,G,B,A,F,SR,SG,SB,W) triangle setup is done on the graphics processor bilinear texture filtering 16bit texels are RSG6B5 50% of pixels written after Zbuffer read/compare Transferring triangle vertex data to the graphics processor from the CPU is commonly the bottleneck. This is different from typical workstations or the PCs of just 1 year ago, when transform and lighting calculation, fill rate, or texture rate were limiting factors. GEOMETRY REPRESENTATION As pixel shading, texturing, and fill rates rise, the most constrained bottleneck in the system will increasingly become creation and transfer of geometry information. The data required to represent a triangle comprises the bulk of system bus traffic in an aggressive 3D application. As","PeriodicalId":298241,"journal":{"name":"Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware","volume":"389 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1998-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"23","resultStr":"{\"title\":\"Unsolved problems and opportunities for high-quality, high-performance 3D graphics on a PC platform\",\"authors\":\"David B. Kirk\",\"doi\":\"10.1145/285305.285306\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the late 1990’s, graphics hardware is experiencing a dramatic board-to-chip integration reminiscent to the minicomputer-to-microprocessor revolution of the 1980’s. Today, mass-market PCs are beginning to match the 3D polygon and pixel rendering of a 1992 Silicon Graphics Reality EngineTM system. The extreme pace of technology evolution in the PC market is such that within 1 or 2 years the performance of a mainstream PC will be very close to the highest performance 3D workstations. At that time, the quality and performance demands will dictate serious changes in PC architecture as well as changes in rendering pipeline and algorithms. This paper will discuss several potential areas of change. A GENERAL PROBLEM STATEMENT The biggest focus of 3D graphics applications on the PC is interactive entertainment, or games. This workload is extremely dynamic, with continuous updating of geometry, textures, animation, lighting, and shading. Although in other applications such as Computer-AidedDesign (CAD), models may be static and retained mode or display list APIs may be used, it is common in games that geometry and textures change regularly. A good operating assumption is that everything changes every frame. The assumption of pervasive change puts a large burden on both the bandwidth and calculation capabilities of the graphics pipeline. GEOMETRY AND PIXEL THROUGHPUT As a baseline, we’ll start with some data and cycle counting of a reasonable workload for an interactive application. PC graphics hardware is capable of this throughput. As an example, this is a bandwidth analysis of a 400 MHz Intel Pentium IITM PC with an Nvidia RNA TNTTM graphics processor. This analysis does not derive from a specific application, but is simply a counting exercise. Many applications push one or more of these limits, but few programs stress all axes. For a typical application to achieve 1M triangles/second, 1 OOM 32bit pixels/second, 2 textures/pixel requires: 1 M triangles * 3 vertices/triangle * 32 bytes/vertex = 100 MB; triangle data crosses the bus 3-5 times (read, transform and written by the CPU, and read by the graphics processor, so simply copying triangle data requires 300-500 MB/second on the PC buses. 1OOM pixels * 8 bytes/pixel (32bit RGBA, 32bit Z/stencil) = 800 MB; with 50% overhead for RMW requires 1.2 GB/second 2 textures/pixel * 4 texelsltexture * 2 bytee a texture cache can create up to 4X reuse efficiency, so requires 400 MB/second Assumptions here include: 32-byte vertices are Direct3DTM TLVertices (X,Y,Z,R,G,B,A,F,SR,SG,SB,W) triangle setup is done on the graphics processor bilinear texture filtering 16bit texels are RSG6B5 50% of pixels written after Zbuffer read/compare Transferring triangle vertex data to the graphics processor from the CPU is commonly the bottleneck. This is different from typical workstations or the PCs of just 1 year ago, when transform and lighting calculation, fill rate, or texture rate were limiting factors. GEOMETRY REPRESENTATION As pixel shading, texturing, and fill rates rise, the most constrained bottleneck in the system will increasingly become creation and transfer of geometry information. The data required to represent a triangle comprises the bulk of system bus traffic in an aggressive 3D application. As\",\"PeriodicalId\":298241,\"journal\":{\"name\":\"Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware\",\"volume\":\"389 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1998-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"23\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/285305.285306\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/285305.285306","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Unsolved problems and opportunities for high-quality, high-performance 3D graphics on a PC platform
In the late 1990’s, graphics hardware is experiencing a dramatic board-to-chip integration reminiscent to the minicomputer-to-microprocessor revolution of the 1980’s. Today, mass-market PCs are beginning to match the 3D polygon and pixel rendering of a 1992 Silicon Graphics Reality EngineTM system. The extreme pace of technology evolution in the PC market is such that within 1 or 2 years the performance of a mainstream PC will be very close to the highest performance 3D workstations. At that time, the quality and performance demands will dictate serious changes in PC architecture as well as changes in rendering pipeline and algorithms. This paper will discuss several potential areas of change. A GENERAL PROBLEM STATEMENT The biggest focus of 3D graphics applications on the PC is interactive entertainment, or games. This workload is extremely dynamic, with continuous updating of geometry, textures, animation, lighting, and shading. Although in other applications such as Computer-AidedDesign (CAD), models may be static and retained mode or display list APIs may be used, it is common in games that geometry and textures change regularly. A good operating assumption is that everything changes every frame. The assumption of pervasive change puts a large burden on both the bandwidth and calculation capabilities of the graphics pipeline. GEOMETRY AND PIXEL THROUGHPUT As a baseline, we’ll start with some data and cycle counting of a reasonable workload for an interactive application. PC graphics hardware is capable of this throughput. As an example, this is a bandwidth analysis of a 400 MHz Intel Pentium IITM PC with an Nvidia RNA TNTTM graphics processor. This analysis does not derive from a specific application, but is simply a counting exercise. Many applications push one or more of these limits, but few programs stress all axes. For a typical application to achieve 1M triangles/second, 1 OOM 32bit pixels/second, 2 textures/pixel requires: 1 M triangles * 3 vertices/triangle * 32 bytes/vertex = 100 MB; triangle data crosses the bus 3-5 times (read, transform and written by the CPU, and read by the graphics processor, so simply copying triangle data requires 300-500 MB/second on the PC buses. 1OOM pixels * 8 bytes/pixel (32bit RGBA, 32bit Z/stencil) = 800 MB; with 50% overhead for RMW requires 1.2 GB/second 2 textures/pixel * 4 texelsltexture * 2 bytee a texture cache can create up to 4X reuse efficiency, so requires 400 MB/second Assumptions here include: 32-byte vertices are Direct3DTM TLVertices (X,Y,Z,R,G,B,A,F,SR,SG,SB,W) triangle setup is done on the graphics processor bilinear texture filtering 16bit texels are RSG6B5 50% of pixels written after Zbuffer read/compare Transferring triangle vertex data to the graphics processor from the CPU is commonly the bottleneck. This is different from typical workstations or the PCs of just 1 year ago, when transform and lighting calculation, fill rate, or texture rate were limiting factors. GEOMETRY REPRESENTATION As pixel shading, texturing, and fill rates rise, the most constrained bottleneck in the system will increasingly become creation and transfer of geometry information. The data required to represent a triangle comprises the bulk of system bus traffic in an aggressive 3D application. As