Proceedings IEEE International Conference on Application- Specific Systems, Architectures, and Processors最新文献_第2页

A model-based methodology for application specific energy efficient data path design using FPGAs 一种基于模型的方法，用于使用fpga设计特定的节能数据路径

Proceedings IEEE International Conference on Application- Specific Systems, Architectures, and Processors Pub Date : 2002-07-17 DOI: 10.1109/ASAP.2002.1030706

Sumit Mohanty, S. Choi, Ju-wook Jang, V. Prasanna

{"title":"A model-based methodology for application specific energy efficient data path design using FPGAs","authors":"Sumit Mohanty, S. Choi, Ju-wook Jang, V. Prasanna","doi":"10.1109/ASAP.2002.1030706","DOIUrl":"https://doi.org/10.1109/ASAP.2002.1030706","url":null,"abstract":"Presents a methodology to design energy-efficient data paths using FPGAs. Our methodology integrates domain specific modeling, coarse-grained performance evaluation, design space exploration, and low level simulation to understand the tradeoffs between energy, latency, and area. The domain specific modeling technique defines a high-level model by identifying various components and parameters specific to a domain that affect the system-wide energy dissipation. A domain is a family of architectures and corresponding algorithms for a given application kernel. The high-level model also consists of functions for estimating energy, latency, and area that facilitate tradeoff analysis. Design space exploration (DSE) analyzes the design space defined by the domain and selects a set of designs. Low-level simulations are used for accurate performance estimation for the designs selected by the DSE and also for final design selection. We illustrate our methodology using a family of architectures and algorithms for matrix multiplication. The designs identified by our methodology demonstrate tradeoffs among energy, latency, and area.","PeriodicalId":424082,"journal":{"name":"Proceedings IEEE International Conference on Application- Specific Systems, Architectures, and Processors","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127702499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

High-radix logarithm with selection by rounding 四舍五入选择的高基数对数

Proceedings IEEE International Conference on Application- Specific Systems, Architectures, and Processors Pub Date : 2002-07-17 DOI: 10.1109/ASAP.2002.1030708

José-Alejandro Piñeiro, M. Ercegovac, J. Bruguera

引用次数: 24

Implications of programmable general purpose processors for compression/encryption applications 压缩/加密应用中可编程通用处理器的含义

Proceedings IEEE International Conference on Application- Specific Systems, Architectures, and Processors Pub Date : 2002-07-17 DOI: 10.1109/ASAP.2002.1030722

Byeong Kil Lee, L. John

{"title":"Implications of programmable general purpose processors for compression/encryption applications","authors":"Byeong Kil Lee, L. John","doi":"10.1109/ASAP.2002.1030722","DOIUrl":"https://doi.org/10.1109/ASAP.2002.1030722","url":null,"abstract":"With the growth of the Internet and mobile communication industry, multimedia applications form a dominant computer workload. Media workloads are typically executed on Application Specific Integrated Circuits (ASICs), application specific processors (ASPs) or general purpose processors (GPPs). GPPs are flexible and allow changes in the applications and algorithms better than ASICs and ASPs. However, executing these applications on GPPs is done at a high cost. In this paper, we analyze media compression/decompression algorithms from the perspective of the overhead of executing them on a programmable general purpose processor versus ASPs. We choose nine encode/decode programs from audio, image/video andencryption applications. The instruction mix, memory access and parallelism aspects during the execution of these programs are analyzed. Memory access latency is observed to be the main factor influencing the execution time on general purpose processors. Most of these compression/decompression algorithms involve processing the data through execution phases (e.g. quantization, encoding, etc) and temporary results are stored and retrieved between these phases. A metric called overhead memory-access bandwidth per input/output byte is defined to characterize the temporary memory activity of each application. We observe that more than 90% of the memory accesses made by these programs are temporary data stores and loads arising from the general purpose nature of the execution platform. We also study the data parallelism in these applications, indicating the ability of instruction level and data level parallel processors to exploit the parallelism in these applications. The parallelism ranges from 6 to 529 in encode processes and 18 to 558 in decode processes.","PeriodicalId":424082,"journal":{"name":"Proceedings IEEE International Conference on Application- Specific Systems, Architectures, and Processors","volume":"132 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126913016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Efficient conversion from binary to multi-digit multi-dimensional logarithmic number systems using arrays of range addressable look-up tables 使用范围可寻址查找表数组从二进制到多位数多维对数系统的有效转换

Proceedings IEEE International Conference on Application- Specific Systems, Architectures, and Processors Pub Date : 2002-07-17 DOI: 10.1109/ASAP.2002.1030711

R. Muscedere, V. Dimitrov, G. Jullien, W. Miller

引用次数: 10

Refining instruction set architecture for high-performance multimedia processing in constrained environments 改进约束环境下高性能多媒体处理的指令集体系结构

Proceedings IEEE International Conference on Application- Specific Systems, Architectures, and Processors Pub Date : 2002-07-17 DOI: 10.1109/ASAP.2002.1030724

R. Lee, A. M. Fiskiran, Z. Shi, Xiao Yang

{"title":"Refining instruction set architecture for high-performance multimedia processing in constrained environments","authors":"R. Lee, A. M. Fiskiran, Z. Shi, Xiao Yang","doi":"10.1109/ASAP.2002.1030724","DOIUrl":"https://doi.org/10.1109/ASAP.2002.1030724","url":null,"abstract":"Multimedia processing in software has been significantly accelerated by the addition of subword-parallel instructions to the instruction set architectures (ISAs) of modem microprocessors. While some of these multimedia instructions are simple and effective, others are very complex, requiring large, special-purpose functional units that are not practical for constrained environments such as handheld multimedia information appliances. For such environments, low-power and low-cost are as important as the high performance required for real-time multimedia processing and the general-purpose programmability required to support an ever growing range of applications. In this paper, we introduce PLX, a concise ISA that selects the most useful features from the first two generations of multimedia instructions added to microprocessors, and explores new ISA features for high-performance yet low-cost multimedia processing with small footprint processors. PLX is unique in that it is designed from scratch as a fully subword-parallel architecture with novel features like datapath scalability from 32-bit to 128-bit words, and a new definition of predication for reducing conditional branches. We illustrate the use of PLX's architectural features with four frequently used multimedia kernels: discrete cosine transform, pixel padding, clip test and median filter. Our performance results show that a 64-bit PLX implementation achieves significant speedups compared to a basic 64-bit RISC processor and to IA-32 processors with MMX and SSE multimedia extensions. PLX's datapath scalability feature often provides an additional 2x speedup in a cost-effective way.","PeriodicalId":424082,"journal":{"name":"Proceedings IEEE International Conference on Application- Specific Systems, Architectures, and Processors","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133635247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17

A VLSI architecture for object recognition using tree matching 基于树匹配的超大规模集成电路目标识别体系结构

Proceedings IEEE International Conference on Application- Specific Systems, Architectures, and Processors Pub Date : 2002-07-17 DOI: 10.1109/ASAP.2002.1030731

K. Sitaraman, N. Ranganathan, A. Ejnioui

{"title":"A VLSI architecture for object recognition using tree matching","authors":"K. Sitaraman, N. Ranganathan, A. Ejnioui","doi":"10.1109/ASAP.2002.1030731","DOIUrl":"https://doi.org/10.1109/ASAP.2002.1030731","url":null,"abstract":"The problem of tree pattern matching for object recognition in images is computationally intensive in nature. In two-dimensional images, the objects can be represented through multiscale decomposition as tree structures. The pattern tree representing an object can be matched with a subject tree representing an image in order to detect the objects within the image. In this paper, we describe a new systolic algorithm and its realization as a VLSI chip for tree pattern matching. The hardware algorithm is based on a linear array of processing elements (PEs) where the pattern matching is done in a pipelined fashion relying on nearest-neighbor communication between the PEs and the subject and pattern trees of arbitrary length can be processed using a fixed size PE array. The algorithm has an improved execution time of O(/spl lceil/m/a/spl rceil/n) required to perform the matching where in, a and n are the sizes of the pattern tree, processor array, subject tree respectively. A prototype CMOS VLSI chip implementing the proposed algorithm has been designed and verified It is shown that the hardware algorithm proposed in this work represent a significant improvement in terms of computational complexity, data flow, and architecture over the ones previously proposed for this problem.","PeriodicalId":424082,"journal":{"name":"Proceedings IEEE International Conference on Application- Specific Systems, Architectures, and Processors","volume":"85 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131012848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

On the propagation of faults and their detection in a hardware implementation of the Advanced Encryption Standard 高级加密标准硬件实现中的故障传播及其检测

Proceedings IEEE International Conference on Application- Specific Systems, Architectures, and Processors Pub Date : 2002-07-17 DOI: 10.1109/ASAP.2002.1030729

G. Bertoni, L. Breveglieri, I. Koren, P. Maistri, V. Piuri

引用次数: 32

A component architecture for FPGA-based, DSP system design 一种基于fpga、DSP的组件架构系统设计

Proceedings IEEE International Conference on Application- Specific Systems, Architectures, and Processors Pub Date : 2002-07-17 DOI: 10.1109/ASAP.2002.1030703

G. Spivey, S. Bhattacharyya, K. Nakajima

引用次数: 7

Design and evaluation of a multimedia computing architecture based on a 3D graphics pipeline 基于三维图形管道的多媒体计算体系结构设计与评价

Proceedings IEEE International Conference on Application- Specific Systems, Architectures, and Processors Pub Date : 2002-07-17 DOI: 10.1109/ASAP.2002.1030723

C. Y. Chung, R. Managuli, Yongmin Kim

{"title":"Design and evaluation of a multimedia computing architecture based on a 3D graphics pipeline","authors":"C. Y. Chung, R. Managuli, Yongmin Kim","doi":"10.1109/ASAP.2002.1030723","DOIUrl":"https://doi.org/10.1109/ASAP.2002.1030723","url":null,"abstract":"With the innovation and integration of media objects in multimedia applications, the importance of architectural support for different types of media objects, e.g., image, video and graphics, in one platform has significantly increased. While several approaches based on vector or VLIW (very long instruction word) architectures, e.g., Vector-IRAM and Imagine, have been pursued, they are not as effective as dedicated graphics pipelines for high-performance 3D graphics. We have explored a new programmable computing architecture based on a 3D graphics pipeline, which utilizes dedicated hardware resources in the 3D graphics pipeline for other types of multimedia computing. Adding programmable flexibility to a graphics pipeline for texture mapping has proven to be effective, e.g., pixel shader. However, due to the diversity of imaging and video processing applications, there are several challenges associated with converting a fixed graphics pipeline to a flexible multimedia computing engine. In this paper, we identify the additional architectural requirements, introduce the proposed architecture with extension details, and present the results of the performance evaluation. With cycle-accurate simulation of several benchmark functions, we have verified that the proposed architecture outperforms a modem powerful media processor in imaging and video processing by a factor of 1.3 to 7.5. The 3D graphics performance would not change much because the additional pipeline stages for the extension result in longer pipeline latency but similar throughout.","PeriodicalId":424082,"journal":{"name":"Proceedings IEEE International Conference on Application- Specific Systems, Architectures, and Processors","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132272781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A combined interval and floating-point comparator/selector 区间和浮点比较器/选择器的组合

Proceedings IEEE International Conference on Application- Specific Systems, Architectures, and Processors Pub Date : 2002-07-17 DOI: 10.1109/ASAP.2002.1030720

A. Akkas

引用次数: 13