Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003最新文献_第2页

An efficient PIM (processor-in-memory) architecture for motion estimation 一种用于运动估计的高效PIM(内存处理器)体系结构

Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003 Pub Date : 2003-06-24 DOI: 10.1109/ASAP.2003.1212852

Jung-Yup Kang, S. Gupta, Saurabh Shah, J. Gaudiot

{"title":"An efficient PIM (processor-in-memory) architecture for motion estimation","authors":"Jung-Yup Kang, S. Gupta, Saurabh Shah, J. Gaudiot","doi":"10.1109/ASAP.2003.1212852","DOIUrl":"https://doi.org/10.1109/ASAP.2003.1212852","url":null,"abstract":"Motion estimation is the most time consuming stage of MPEG family encodings and it reportedly absorbs up to 90% of the total execution time of MPEG processing. Therefore, we propose a hardware/software co-design paradigm that uses a PIM module to efficiently execute motion estimation operations. We use a PIM module to reduce the memory access penalty caused by a large number of memory accesses. We segment the PIM module into small pieces so that each smaller PIM module can execute the operations in parallel fashion. However, in order to execute the operations in parallel, there are critical overheads that involve replicating a huge amount of data to many of these smaller PIM modules. Not only do these replications require a huge amount of additional memory accesses but also calculations when generating addresses. Therefore, we also present an efficient data distribution mechanism to effectively support parallel executions among these smaller PIM modules. With our paradigm, the host processor can be relieved from computationally-intensive and data-intensive workloads of motion estimation. We observed up to 2034/spl times/ improvement in reduction of the number of memory accesses and up to 439/spl times/ performance improvement for the execution of motion estimation operations when using our computing paradigm.","PeriodicalId":261592,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116409541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Color space conversion for MPEG decoding on FPGA-augmented TriMedia processor 基于fpga增强TriMedia处理器的MPEG解码彩色空间转换

Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003 Pub Date : 2003-06-24 DOI: 10.1109/ASAP.2003.1212848

M. Sima, S. Vassiliadis, S. Cotofana, J. V. Eijndhoven

引用次数: 21

A cryptographic processor for arbitrary elliptic curves over GF(2/sup m/) GF(2/sup m/)上任意椭圆曲线的密码处理器

Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003 Pub Date : 2003-06-24 DOI: 10.1109/ASAP.2003.1212867

H. Eberle, N. Gura, S. C. Shantz

{"title":"A cryptographic processor for arbitrary elliptic curves over GF(2/sup m/)","authors":"H. Eberle, N. Gura, S. C. Shantz","doi":"10.1109/ASAP.2003.1212867","DOIUrl":"https://doi.org/10.1109/ASAP.2003.1212867","url":null,"abstract":"We describe a cryptographic processor for elliptic curve cryptography (ECC). ECC is evolving as an attractive alternative to other public-key schemes such as RSA by offering the smallest key size and the highest strength per bit. The processor performs point multiplication for elliptic curves over binary polynomial fields GF(2/sup m/). In contrast to other designs that only support one curve at a time, our processor is capable of handling arbitrary curves without requiring reconfiguration. More specifically, it can handle both named curves as standardized by NIST as well as any other generic curves up to a field degree of 255. Efficient support for arbitrary curves is particularly important for the targeted server applications that need to handle requests for secure connections generated by a multitude of heterogeneous client devices. Such requests may specify curves which are infrequently used or not even known at implementation time. Our processor implements 256 bit modular multiplication, division, addition and squaring. The multiplier constitutes the core function as it executes the bulk of the point multiplication algorithm. We present a novel digit-serial modular multiplier that uses a hybrid architecture to perform the reduction operation needed to reduce the multiplication result: hardwired logic is used for fast reduction of named curves and the multiplier circuit is reused for reduction of generic curves. The performance of our FPGA-based prototype, running at a clock frequency of 66.4 MHz, is 6955 point multiplications per second for named curves over GF(2/sup 163/) and 3308 point multiplications per second for generic curves over GF(2/sup 163/).","PeriodicalId":261592,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129641962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 33

Physical planning for on-chip multiprocessor networks and switch fabrics 片上多处理器网络和交换结构的物理规划

Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003 Pub Date : 2003-06-24 DOI: 10.1109/ASAP.2003.1212833

Terry Tao Ye, G. Micheli

引用次数: 37

Reconfigurable computing and electronic nanotechnology 可重构计算和电子纳米技术

Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003 Pub Date : 2003-06-24 DOI: 10.1109/ASAP.2003.1212837

S. Goldstein, M. Budiu, M. Mishra, Girish Venkataramani

{"title":"Reconfigurable computing and electronic nanotechnology","authors":"S. Goldstein, M. Budiu, M. Mishra, Girish Venkataramani","doi":"10.1109/ASAP.2003.1212837","DOIUrl":"https://doi.org/10.1109/ASAP.2003.1212837","url":null,"abstract":"We examine the opportunities brought about by recent progress in electronic nanotechnology and describe the methods needed to harness them for building a new computer architecture. In this process we decompose some traditional abstractions, such as the transistor, into fine-grain pieces, such as signal restoration and input-output isolation. We also show how we can forgo the extreme reliability of CMOS circuits for low-cost chemical self-assembly at the expense of large manufacturing defect densities. We discuss advanced testing methods that can be used to recover perfect functionality from unreliable parts. We proceed to show how the molecular switch, the regularity of the circuits created by self-assembly and the high defect densities logically require the use of reconfigurable hardware as a basic building block for hardware design. We then capitalize on the convergence of compilation and hardware synthesis (which takes place when programming reconfigurable hardware) to propose the complete elimination of the instruction-set architecture from the system architecture, and the synthesis of asynchronous dataflow machines directly from high-level programming languages, such as C. We discuss in some detail a scalable compilation system that performs this task.","PeriodicalId":261592,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133811992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 36

Area and time efficient modular multiplication of large integers 面积和时间效率高的大整数模乘法

Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003 Pub Date : 2003-06-24 DOI: 10.1109/ASAP.2003.1212863

Viktor Bunimov, M. Schimmler

{"title":"Area and time efficient modular multiplication of large integers","authors":"Viktor Bunimov, M. Schimmler","doi":"10.1109/ASAP.2003.1212863","DOIUrl":"https://doi.org/10.1109/ASAP.2003.1212863","url":null,"abstract":"A new modular multiplication algorithm and its corresponding architecture is presented. It is optimised with respect to hardware complexity and latency. Based on the dataflow of the well known interleaved modular multiplication the product of two n-bit-integers X and Y modulo M is computed by n iterations of a simple loop. The loop consists of one single carry save addition, a comparison of constant complexity, and a table lookup, where the table contains 6 precomputed values and two constants. By this construction the arithmetical complexity of the modular multiplication is reduced to n additions without carry propagation in total which leads to a speedup of at least two in comparison to all methods previously known. It consists of a first algorithm A2 implementing the new idea of combining carry save addition and constant time comparison. A2 is not optimal with respect to area and time. Its correctness is proven. By use of a small amount of precomputing the loop of A2 can be modified such that the effort within the loop is minimised. This leads to the algorithm A3 and it is verified.","PeriodicalId":261592,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133472903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 46

Complex division with prescaling of operands 带操作数预标化的复除法

Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003 Pub Date : 2003-06-24 DOI: 10.1109/ASAP.2003.1212854

J. Muller

引用次数: 32

Hardware implementation of an elliptic curve processor over GF(p) GF(p)上椭圆曲线处理器的硬件实现

Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003 Pub Date : 2003-06-24 DOI: 10.1109/ASAP.2003.1212866

S. Yalcin, L. Batina, B. Preneel, J. Vandewalle

引用次数: 134

Iterative methods for logarithmic subtraction 对数减法的迭代法

Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003 Pub Date : 2003-06-24 DOI: 10.1109/ASAP.2003.1212855

M. Arnold

{"title":"Iterative methods for logarithmic subtraction","authors":"M. Arnold","doi":"10.1109/ASAP.2003.1212855","DOIUrl":"https://doi.org/10.1109/ASAP.2003.1212855","url":null,"abstract":"The logarithmic number system (LNS) offers much better performance (in terms of power, speed and area) than floating point for multiplication, division, powers and roots. Moderate-precision addition (of like signs) in LNS generally can be done with table lookup followed by interpolation, whose implementation can be as, or more, efficient than the equivalent precision floating-point adder. The problem with LNS is the size of the table needed for subtraction. We consider iterative methods for logarithmic subtraction. The basis for the novel methods proposed here is that the subtraction logarithm is the inverse of the addition logarithm. Although the mathematics for this kind of logarithmic subtraction were first described during the time of Gauss, no modern designer has implemented an algorithm, like the one proposed here, which performs a binary search followed by an inverse interpolation. Additionally, we propose a novel initialization step for the binary search, which doubles the speed of the algorithm compared to a name, implementation. Combining the proposed method with other iterative methods may reduce the average execution time further. Synthesis results indicate the proposed methods are feasible for FPGA implementation.","PeriodicalId":261592,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003","volume":"127 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117352144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Using media processors for low-memory AES implementation

Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003 Pub Date : 2003-06-24 DOI: 10.1109/ASAP.2003.1212838

J. Irwin, D. Page

引用次数: 15