A chip set for a ray-casting engine

VLSI Signal Processing, IX Pub Date : 1996-10-30 DOI:10.1109/VLSISP.1996.558335

G. Hekstra, E. Deprettere

{"title":"A chip set for a ray-casting engine","authors":"G. Hekstra, E. Deprettere","doi":"10.1109/VLSISP.1996.558335","DOIUrl":null,"url":null,"abstract":"Rendering artificial scenes is an appealing example of a class of problems leading to complex data dependent algorithms for which efficient software/hardware mapping techniques have to be envisaged. We present one of the ASICs in our rendering system to illustrate our design methodology in more detail. The first step in the algorithm-architecture design is to reformulate an existing naive algorithm in such a way that, as much as possible, only significant operations are performed. The resulting algorithm has a nested loop structure, with non-manifest, data-dependent loop bounds, rendering classical techniques for parallelisation useless. The second step is to greatly reduce the overall computation time of the algorithm by reducing the computational complexity of the innermost loop operation. The third and last step is to map this algorithm on a pipelined architecture, where the pipeline stages-functional units within an ASIC-implement different loop levels. Due to the data dependent nature, the functional units that implement the parts of the loops are time-varying with regard to both execution time and in how much data is produced for the following pipeline stages. Since the execution times of the various pipeline stages are changing, so does the location of the bottleneck over time. Hence the goal is not to keep all pipeline stages continually busy, but to keep the throughput at the most critical innermost loop operation as high as possible.","PeriodicalId":290885,"journal":{"name":"VLSI Signal Processing, IX","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1996-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"VLSI Signal Processing, IX","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/VLSISP.1996.558335","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Rendering artificial scenes is an appealing example of a class of problems leading to complex data dependent algorithms for which efficient software/hardware mapping techniques have to be envisaged. We present one of the ASICs in our rendering system to illustrate our design methodology in more detail. The first step in the algorithm-architecture design is to reformulate an existing naive algorithm in such a way that, as much as possible, only significant operations are performed. The resulting algorithm has a nested loop structure, with non-manifest, data-dependent loop bounds, rendering classical techniques for parallelisation useless. The second step is to greatly reduce the overall computation time of the algorithm by reducing the computational complexity of the innermost loop operation. The third and last step is to map this algorithm on a pipelined architecture, where the pipeline stages-functional units within an ASIC-implement different loop levels. Due to the data dependent nature, the functional units that implement the parts of the loops are time-varying with regard to both execution time and in how much data is produced for the following pipeline stages. Since the execution times of the various pipeline stages are changing, so does the location of the bottleneck over time. Hence the goal is not to keep all pipeline stages continually busy, but to keep the throughput at the most critical innermost loop operation as high as possible.

查看原文本刊更多论文

用于光线投射引擎的芯片

渲染人工场景是导致复杂数据依赖算法的一类问题的一个吸引人的例子，必须设想有效的软件/硬件映射技术。我们展示了渲染系统中的一个asic，以更详细地说明我们的设计方法。算法架构设计的第一步是重新制定现有的朴素算法，使其尽可能只执行重要的操作。生成的算法具有嵌套循环结构，具有非明显的、依赖数据的循环边界，使得传统的并行化技术毫无用处。第二步是通过降低最内层循环操作的计算复杂度来大大减少算法的整体计算时间。第三步也是最后一步是将该算法映射到流水线架构上，其中流水线阶段(asic中的功能单元)实现不同的循环级别。由于数据依赖的性质，实现循环部分的功能单元在执行时间和为以下管道阶段产生的数据量方面都是时变的。由于各个管道阶段的执行时间都在变化，因此瓶颈的位置也会随着时间的推移而变化。因此，我们的目标不是让所有管道阶段都持续忙碌，而是在最关键的最内层循环操作中保持尽可能高的吞吐量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

VLSI Signal Processing, IX

自引率

0.00%

发文量