Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004.最新文献_第3页

Optimized data-reuse in processor arrays 优化了处理器数组中的数据重用

Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004. Pub Date : 2004-09-27 DOI: 10.1109/ASAP.2004.10024

Sebastian Siegel, R. Merker

引用次数: 6

A hierarchical classification scheme to derive interprocess communication in process networks 一种派生进程网络中进程间通信的分层分类方案

Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004. Pub Date : 2004-09-27 DOI: 10.1109/ASAP.2004.10025

A. Turjan, B. Kienhuis, E. Deprettere

{"title":"A hierarchical classification scheme to derive interprocess communication in process networks","authors":"A. Turjan, B. Kienhuis, E. Deprettere","doi":"10.1109/ASAP.2004.10025","DOIUrl":"https://doi.org/10.1109/ASAP.2004.10025","url":null,"abstract":"The Compaan compiler automatically derives a process network (PN) description from an application written in Matlab. The basic element of a PN is a producer/consumer (P/C) pair. Four different communication patterns for a P/C pair have been identified and the complexity of communication structure differs depending on the communication pattern involved. Therefore, in order to obtain cost-efficient process networks our compiler automatically identifies the communication pattern of each P/C pair. This problem is equivalent to integer linear programming and thus in general can not be solved efficiently. In this paper we present simpler techniques that allow classifying the interprocess communication in a PN. However, in some cases those techniques do not allow to find an answer and therefore, an ILP test has still to be applied. Thus, we introduce a hierarchical classification scheme that correctly classifies the interprocess communication, but uses dramatically less integer linear programming, in only 5% of the cases to classify, we still rely on integer linear programming; in the remaining 95%, the techniques presented Are able to classify a case correctly.","PeriodicalId":120245,"journal":{"name":"Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122064995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Architectural support for arithmetic in optimal extension fields 对最优扩展字段中的算法的体系结构支持

Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004. Pub Date : 2004-09-27 DOI: 10.1109/ASAP.2004.10004

J. Großschädl, Sandeep S. Kumar, C. Paar

{"title":"Architectural support for arithmetic in optimal extension fields","authors":"J. Großschädl, Sandeep S. Kumar, C. Paar","doi":"10.1109/ASAP.2004.10004","DOIUrl":"https://doi.org/10.1109/ASAP.2004.10004","url":null,"abstract":"Public-key cryptosystems generally involve computation-intensive arithmetic operations, making them impractical for software implementation on constrained devices such as smart cards. We investigate the potential of architectural enhancements and instruction set extensions for low-level arithmetic used in public-key cryptography, most notably multiplication in finite fields of large order. The focus of the present work is directed towards a special type of finite fields, the so-called optimal extension fields GF(p/sup m/) where p is a pseudo-Mersenne (PM) prime of the form p = 2/sup n/ - c that fits into a single register. Based on the M/PS32 instruction set architecture, we introduce two custom instructions to accelerate the reduction modulo a PM prime. Moreover, we show that the multiplication in an optimal extension field can take advantage of a multiply/accumulate unit with a wide accumulator so that a certain number of 64-bit products can be summed up without overflow. The proposed extensions support a wide range of PM primes and allow a reduction modulo 2/sup n/ - c to complete in only four clock cycles when n /spl les/ 32.","PeriodicalId":120245,"journal":{"name":"Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127525274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

A novel highly reliable low-power nano architecture when von Neumann augments Kolmogorov 当冯·诺依曼增强柯尔莫哥洛夫时，一种新颖的高可靠的低功耗纳米架构

Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004. Pub Date : 2004-09-27 DOI: 10.1109/ASAP.2004.10021

Valeriu Beiu

{"title":"A novel highly reliable low-power nano architecture when von Neumann augments Kolmogorov","authors":"Valeriu Beiu","doi":"10.1109/ASAP.2004.10021","DOIUrl":"https://doi.org/10.1109/ASAP.2004.10021","url":null,"abstract":"This work presents a novel architecture, which is both device and circuit independent. The starting idea is that computations can be performed in three fundamentally different ways: entirely digital (using Boolean gates), entirely analog (using analog circuits), or mixed (using both digital and analog circuits). The boundaries between these are sometimes very thin. As an example, a threshold logic gate is already mixed, i.e. even if the inputs and the output are Boolean, the weighted sum-of-inputs is a multiple-valued logic signal, i.e. a low-precision analog signal. It has already been suggested that, at least for CMOS, a mixed analog/digital approach is the most power-efficient solution. Still, the main disadvantages of using analog circuits are: (i) their more complex (handcrafted) design, and (ii) their (expected) lower reliability (signal-to-noise or precision), which will be exacerbated by scaling. Here, we will show how both these disadvantages could be tackled. A constructive solution for Kolmogorov's superposition and (multi-threshold) threshold logic synthesis could be used for automating the design. Digital or threshold logic circuits will compensate for the accumulation of noise in the cascaded (very) low precision analog circuits. These digital circuits will also contribute to a von Neumann's multiplexing scheme used to augment the defect- and fault-tolerance of the architecture. A few examples will show how this architectural approach could be mapped on top of a given (nano) technology.","PeriodicalId":120245,"journal":{"name":"Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128479392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 29

Design of the QBIC wearable computing platform QBIC可穿戴计算平台的设计

Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004. Pub Date : 2004-09-27 DOI: 10.1109/ASAP.2004.10001

O. Amft, M. Lauffer, Stijn Ossevoort, Fabrizio Macaluso, P. Lukowicz, G. Tröster

引用次数: 74

Optimizing the memory bandwidth with loop morphing 利用循环变形优化存储器带宽

Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004. Pub Date : 2004-09-27 DOI: 10.1109/ASAP.2004.10020

J. I. Gómez, P. Marchal, Sven Verdoolaege, L. Piñuel, F. Catthoor

{"title":"Optimizing the memory bandwidth with loop morphing","authors":"J. I. Gómez, P. Marchal, Sven Verdoolaege, L. Piñuel, F. Catthoor","doi":"10.1109/ASAP.2004.10020","DOIUrl":"https://doi.org/10.1109/ASAP.2004.10020","url":null,"abstract":"The memory bandwidth largely determines the performance of embedded systems. However, very often compilers ignore the actual behavior of the memory architecture, causing large performance loss. To better utilize the memory bandwidth, several researchers have introduced instruction scheduling/data assignment techniques. Because they only optimize the bandwidth inside each basic block, they often fail to use all available bandwidth. Loop fusion is an interesting alternative to more globally optimize the memory access schedule. By fusing loops we increase the number of independent memory operations inside each basic block. The compiler can then better exploit the available bandwidth and increase the system's performance. However, existing fusion techniques can only combine loops with a conformable header. To overcome this limitation we present loop morphing; we combine fusion with strip mining and loop splitting. We also introduce a technique to steer loop morphing such that we find a compact memory access schedule. Experimental results show that with our approach we can decrease the execution time up to 88%.","PeriodicalId":120245,"journal":{"name":"Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125593869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

Modeling and scheduling parallel data flow systems using structured systems of recurrence equations 用递归方程的结构化系统建模和调度并行数据流系统

Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004. Pub Date : 2004-09-27 DOI: 10.1109/ASAP.2004.10032

François Charot, Madeleine Nyamsi, P. Quinton, Charles Wagner

引用次数: 10

Evaluating instruction set extensions for fast arithmetic on binary finite fields 二元有限域上快速算法的指令集扩展评估

Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004. Pub Date : 2004-09-27 DOI: 10.1109/ASAP.2004.10003

A. M. Fiskiran, R. Lee

{"title":"Evaluating instruction set extensions for fast arithmetic on binary finite fields","authors":"A. M. Fiskiran, R. Lee","doi":"10.1109/ASAP.2004.10003","DOIUrl":"https://doi.org/10.1109/ASAP.2004.10003","url":null,"abstract":"Binary finite fields GF(2/sup n/) are very commonly used in cryptography, particularly in public-key algorithms such as elliptic curve cryptography (ECC). On word-oriented programmable processors, field elements are generally represented as polynomials with coefficients from [0, 1]. Key arithmetic operations on these polynomials, such as squaring and multiplication, are not supported by integer-oriented processor architectures. Instead, these are implemented in software, causing a very large fraction of the cryptography execution time to be dominated by a few elementary operations. For example, more than 90% of the execution time of 163-bit ECC may be consumed by two simple field operations: squaring and multiplication. A few processor architectures have been proposed recently that include instructions for binary field arithmetic. However, these have only considered processors with small wordsizes and in-order, single-issue execution. The first contribution of this paper is to validate these new arithmetic instructions for processors with wider wordsizes and multiple-issue (e.g. superscalar) execution. We also consider the effects of varying the number of functional units and load/store pipes. We demonstrate that the combination of microarchitecture and new instructions provides speedups up to 22.4x for ECC point multiplication. Second, we show that if a bit-level reverse instruction is included in the instruction set, the size of the multiplier can be reduced by half without significant performance degradation. Third, we compare the benefits of superscalar execution with wordsize scaling. The latter has been used in recent processor architectures such as PLX and PAX as a new way to extract parallelism. We show that 2x wordsize scaling provides 70% better performance than 2-way superscalar execution. Finally, we suggest a low-cost method, which we call multi-word result execution, to realize some of the benefits of wordsize scaling in existing processors with fixed wordsizes.","PeriodicalId":120245,"journal":{"name":"Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124819789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

Families of FPGA-based algorithms for approximate string matching 基于fpga的近似字符串匹配算法家族

Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004. Pub Date : 2004-09-27 DOI: 10.1109/ASAP.2004.10013

T. Court, M. Herbordt

引用次数: 52

Decimal floating-point division using Newton-Raphson iteration 使用牛顿-拉夫森迭代的十进制浮点除法

Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004. Pub Date : 2004-09-27 DOI: 10.1109/ASAP.2004.10005

Liang-Kai Wang, M. Schulte

引用次数: 42