Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004.最新文献

筛选
英文 中文
Optimized data-reuse in processor arrays 优化了处理器数组中的数据重用
Sebastian Siegel, R. Merker
{"title":"Optimized data-reuse in processor arrays","authors":"Sebastian Siegel, R. Merker","doi":"10.1109/ASAP.2004.10024","DOIUrl":"https://doi.org/10.1109/ASAP.2004.10024","url":null,"abstract":"We present a method for co-partitioning affine indexed algorithms resulting in a processor array with an optimized data-reuse. Through this method, a memory hierarchy with an optimized data transfer is derived which allows a significant reduction of the power consumption caused by memory accesses. Apart from former design flows which begin with a space-time transformation, we start with the co-partitioning of the iteration space. This allows an adaption of the resulting processor array towards the constraints of the target architecture at the beginning of the design. We illustrate our method for the full search motion estimation algorithm which bears a high potential of data-reuse.","PeriodicalId":120245,"journal":{"name":"Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125225650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
A hierarchical classification scheme to derive interprocess communication in process networks 一种派生进程网络中进程间通信的分层分类方案
A. Turjan, B. Kienhuis, E. Deprettere
{"title":"A hierarchical classification scheme to derive interprocess communication in process networks","authors":"A. Turjan, B. Kienhuis, E. Deprettere","doi":"10.1109/ASAP.2004.10025","DOIUrl":"https://doi.org/10.1109/ASAP.2004.10025","url":null,"abstract":"The Compaan compiler automatically derives a process network (PN) description from an application written in Matlab. The basic element of a PN is a producer/consumer (P/C) pair. Four different communication patterns for a P/C pair have been identified and the complexity of communication structure differs depending on the communication pattern involved. Therefore, in order to obtain cost-efficient process networks our compiler automatically identifies the communication pattern of each P/C pair. This problem is equivalent to integer linear programming and thus in general can not be solved efficiently. In this paper we present simpler techniques that allow classifying the interprocess communication in a PN. However, in some cases those techniques do not allow to find an answer and therefore, an ILP test has still to be applied. Thus, we introduce a hierarchical classification scheme that correctly classifies the interprocess communication, but uses dramatically less integer linear programming, in only 5% of the cases to classify, we still rely on integer linear programming; in the remaining 95%, the techniques presented Are able to classify a case correctly.","PeriodicalId":120245,"journal":{"name":"Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122064995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Architectural support for arithmetic in optimal extension fields 对最优扩展字段中的算法的体系结构支持
J. Großschädl, Sandeep S. Kumar, C. Paar
{"title":"Architectural support for arithmetic in optimal extension fields","authors":"J. Großschädl, Sandeep S. Kumar, C. Paar","doi":"10.1109/ASAP.2004.10004","DOIUrl":"https://doi.org/10.1109/ASAP.2004.10004","url":null,"abstract":"Public-key cryptosystems generally involve computation-intensive arithmetic operations, making them impractical for software implementation on constrained devices such as smart cards. We investigate the potential of architectural enhancements and instruction set extensions for low-level arithmetic used in public-key cryptography, most notably multiplication in finite fields of large order. The focus of the present work is directed towards a special type of finite fields, the so-called optimal extension fields GF(p/sup m/) where p is a pseudo-Mersenne (PM) prime of the form p = 2/sup n/ - c that fits into a single register. Based on the M/PS32 instruction set architecture, we introduce two custom instructions to accelerate the reduction modulo a PM prime. Moreover, we show that the multiplication in an optimal extension field can take advantage of a multiply/accumulate unit with a wide accumulator so that a certain number of 64-bit products can be summed up without overflow. The proposed extensions support a wide range of PM primes and allow a reduction modulo 2/sup n/ - c to complete in only four clock cycles when n /spl les/ 32.","PeriodicalId":120245,"journal":{"name":"Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127525274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
A novel highly reliable low-power nano architecture when von Neumann augments Kolmogorov 当冯·诺依曼增强柯尔莫哥洛夫时,一种新颖的高可靠的低功耗纳米架构
Valeriu Beiu
{"title":"A novel highly reliable low-power nano architecture when von Neumann augments Kolmogorov","authors":"Valeriu Beiu","doi":"10.1109/ASAP.2004.10021","DOIUrl":"https://doi.org/10.1109/ASAP.2004.10021","url":null,"abstract":"This work presents a novel architecture, which is both device and circuit independent. The starting idea is that computations can be performed in three fundamentally different ways: entirely digital (using Boolean gates), entirely analog (using analog circuits), or mixed (using both digital and analog circuits). The boundaries between these are sometimes very thin. As an example, a threshold logic gate is already mixed, i.e. even if the inputs and the output are Boolean, the weighted sum-of-inputs is a multiple-valued logic signal, i.e. a low-precision analog signal. It has already been suggested that, at least for CMOS, a mixed analog/digital approach is the most power-efficient solution. Still, the main disadvantages of using analog circuits are: (i) their more complex (handcrafted) design, and (ii) their (expected) lower reliability (signal-to-noise or precision), which will be exacerbated by scaling. Here, we will show how both these disadvantages could be tackled. A constructive solution for Kolmogorov's superposition and (multi-threshold) threshold logic synthesis could be used for automating the design. Digital or threshold logic circuits will compensate for the accumulation of noise in the cascaded (very) low precision analog circuits. These digital circuits will also contribute to a von Neumann's multiplexing scheme used to augment the defect- and fault-tolerance of the architecture. A few examples will show how this architectural approach could be mapped on top of a given (nano) technology.","PeriodicalId":120245,"journal":{"name":"Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128479392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
Design of the QBIC wearable computing platform QBIC可穿戴计算平台的设计
O. Amft, M. Lauffer, Stijn Ossevoort, Fabrizio Macaluso, P. Lukowicz, G. Tröster
{"title":"Design of the QBIC wearable computing platform","authors":"O. Amft, M. Lauffer, Stijn Ossevoort, Fabrizio Macaluso, P. Lukowicz, G. Tröster","doi":"10.1109/ASAP.2004.10001","DOIUrl":"https://doi.org/10.1109/ASAP.2004.10001","url":null,"abstract":"Wearable computing systems can be broadly defined as mobile electronic devices that can be unobtrusively embedded in a user's outfit as part of the garment or an accessory. Unlike conventional mobile devices, such systems shall be virtually invisible, not hindering physical activity, always active and running without user's attention. We present our wearability driven design approach and the philosophy for a novel wearable computing system integrated into a fully functional belt. This system integrates the main electronics in the buckle of a belt and utilizes the belt itself as extension bus and mechanical support for add ons. The system runs GNU/Linux operating system and has sufficient resources to address a variety of applications in the field of wearable computing. Considerations regarding ergonomic design, system architecture, first implementation results and applications are presented.","PeriodicalId":120245,"journal":{"name":"Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122172426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 74
Optimizing the memory bandwidth with loop morphing 利用循环变形优化存储器带宽
J. I. Gómez, P. Marchal, Sven Verdoolaege, L. Piñuel, F. Catthoor
{"title":"Optimizing the memory bandwidth with loop morphing","authors":"J. I. Gómez, P. Marchal, Sven Verdoolaege, L. Piñuel, F. Catthoor","doi":"10.1109/ASAP.2004.10020","DOIUrl":"https://doi.org/10.1109/ASAP.2004.10020","url":null,"abstract":"The memory bandwidth largely determines the performance of embedded systems. However, very often compilers ignore the actual behavior of the memory architecture, causing large performance loss. To better utilize the memory bandwidth, several researchers have introduced instruction scheduling/data assignment techniques. Because they only optimize the bandwidth inside each basic block, they often fail to use all available bandwidth. Loop fusion is an interesting alternative to more globally optimize the memory access schedule. By fusing loops we increase the number of independent memory operations inside each basic block. The compiler can then better exploit the available bandwidth and increase the system's performance. However, existing fusion techniques can only combine loops with a conformable header. To overcome this limitation we present loop morphing; we combine fusion with strip mining and loop splitting. We also introduce a technique to steer loop morphing such that we find a compact memory access schedule. Experimental results show that with our approach we can decrease the execution time up to 88%.","PeriodicalId":120245,"journal":{"name":"Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125593869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Modeling and scheduling parallel data flow systems using structured systems of recurrence equations 用递归方程的结构化系统建模和调度并行数据流系统
François Charot, Madeleine Nyamsi, P. Quinton, Charles Wagner
{"title":"Modeling and scheduling parallel data flow systems using structured systems of recurrence equations","authors":"François Charot, Madeleine Nyamsi, P. Quinton, Charles Wagner","doi":"10.1109/ASAP.2004.10032","DOIUrl":"https://doi.org/10.1109/ASAP.2004.10032","url":null,"abstract":"Many multimedia and telecommunications applications are modeled as multi-rate, parallel data flow systems. We present techniques to model and schedule such applications using structured systems of recurrence equations. We show that the schedule can be obtained first by computing the period of each component of the system, then by applying structured scheduling to the entire system. This method is implemented in the MMAlpha software, and it is applied to model a WCDMA uplink receiver.","PeriodicalId":120245,"journal":{"name":"Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114396139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Evaluating instruction set extensions for fast arithmetic on binary finite fields 二元有限域上快速算法的指令集扩展评估
A. M. Fiskiran, R. Lee
{"title":"Evaluating instruction set extensions for fast arithmetic on binary finite fields","authors":"A. M. Fiskiran, R. Lee","doi":"10.1109/ASAP.2004.10003","DOIUrl":"https://doi.org/10.1109/ASAP.2004.10003","url":null,"abstract":"Binary finite fields GF(2/sup n/) are very commonly used in cryptography, particularly in public-key algorithms such as elliptic curve cryptography (ECC). On word-oriented programmable processors, field elements are generally represented as polynomials with coefficients from [0, 1]. Key arithmetic operations on these polynomials, such as squaring and multiplication, are not supported by integer-oriented processor architectures. Instead, these are implemented in software, causing a very large fraction of the cryptography execution time to be dominated by a few elementary operations. For example, more than 90% of the execution time of 163-bit ECC may be consumed by two simple field operations: squaring and multiplication. A few processor architectures have been proposed recently that include instructions for binary field arithmetic. However, these have only considered processors with small wordsizes and in-order, single-issue execution. The first contribution of this paper is to validate these new arithmetic instructions for processors with wider wordsizes and multiple-issue (e.g. superscalar) execution. We also consider the effects of varying the number of functional units and load/store pipes. We demonstrate that the combination of microarchitecture and new instructions provides speedups up to 22.4x for ECC point multiplication. Second, we show that if a bit-level reverse instruction is included in the instruction set, the size of the multiplier can be reduced by half without significant performance degradation. Third, we compare the benefits of superscalar execution with wordsize scaling. The latter has been used in recent processor architectures such as PLX and PAX as a new way to extract parallelism. We show that 2x wordsize scaling provides 70% better performance than 2-way superscalar execution. Finally, we suggest a low-cost method, which we call multi-word result execution, to realize some of the benefits of wordsize scaling in existing processors with fixed wordsizes.","PeriodicalId":120245,"journal":{"name":"Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124819789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Families of FPGA-based algorithms for approximate string matching 基于fpga的近似字符串匹配算法家族
T. Court, M. Herbordt
{"title":"Families of FPGA-based algorithms for approximate string matching","authors":"T. Court, M. Herbordt","doi":"10.1109/ASAP.2004.10013","DOIUrl":"https://doi.org/10.1109/ASAP.2004.10013","url":null,"abstract":"Dynamic programming for approximate string matching is a large family of different algorithms, which vary significantly in purpose, complexity, and hardware utilization. Many implementations have reported impressive speed-ups, but have typically been point solutions -highly specialized and addressing only one or a few of the many possible options. The problem to be solved is creating a hardware description that implements a broad range of behavioral options without losing efficiency due to feature bloat. We report a set of three component types that address different parts of the DP string matching problem. Multiple, interchangeable implementations are available for each component type. This allows each application to choose the feature set required, then make maximum use of the FPGA fabric according to that application's specific resource requirements. Synthesis estimates show a 4:1 improvement in time-space performance, depending on the options chosen for a specific matching task.","PeriodicalId":120245,"journal":{"name":"Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128289366","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 52
Decimal floating-point division using Newton-Raphson iteration 使用牛顿-拉夫森迭代的十进制浮点除法
Liang-Kai Wang, M. Schulte
{"title":"Decimal floating-point division using Newton-Raphson iteration","authors":"Liang-Kai Wang, M. Schulte","doi":"10.1109/ASAP.2004.10005","DOIUrl":"https://doi.org/10.1109/ASAP.2004.10005","url":null,"abstract":"Decreasing feature sizes allow additional functionality to be added to future microprocessors to improve the performance of important application domains. As a result of rapid growth in financial, commercial, and Internet-based applications, hardware support for decimal floating-point arithmetic is now being considered by various computer manufacturers and specifications for decimal floating-point arithmetic have been added to the draft revision of the IEEE-754 Standard for Floating-Point Arithmetic (IEEE-754R). This work presents an efficient arithmetic algorithm and hardware design for decimal floating-point division. The design uses an optimized piecewise linear approximation, a modified Newton-Raphson iteration, a specialized rounding technique, and a simplified combined decimal incrementer/decrementer. Synthesis results show that a 64-bit (16-digit) implementation of the decimal divider, which is compliant with IEEE-754R, has an estimated critical path delay of 0.69 ns when implemented using LSI Logic's 0.11 micron gflx-p standard cell library.","PeriodicalId":120245,"journal":{"name":"Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130910181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 42
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信