Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003最新文献

筛选
英文 中文
Comparison of branching CORDIC implementations 分支CORDIC实现的比较
Abhishek Singh, D. Phatak, T. Goff, Mike Riggs, J. Plusquellic, C. Patel
{"title":"Comparison of branching CORDIC implementations","authors":"Abhishek Singh, D. Phatak, T. Goff, Mike Riggs, J. Plusquellic, C. Patel","doi":"10.1109/ASAP.2003.1212845","DOIUrl":"https://doi.org/10.1109/ASAP.2003.1212845","url":null,"abstract":"We compare implementations of Duprat and Muller's branching CORDIC and Phatak's double step branching (DSB)-CORDIC algorithms for sine and cosine evaluation. For reference we also report on classical CORDIC implementations for the same wordlengths. We have also implemented double stepping in the classical algorithm and report on the performance of this method. CORDIC evaluation of sine and cosine includes two parts, the zeroer and the rotator. We discuss implementation issues related to the minimization of the delay of each iteration of the algorithm (including delays for both the zeroer as well the rotator). We then examine hybrid methods that select the components from different algorithms (such as a DSB zeroer together with a classical rotator or vice versa).","PeriodicalId":261592,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123006769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Energy aware register file implementation through instruction predecode 能量感知寄存器文件的实现,通过指令的前身
J. Ayala, M. López-Vallejo, A. Veidenbaum, Carlos A. Lopez
{"title":"Energy aware register file implementation through instruction predecode","authors":"J. Ayala, M. López-Vallejo, A. Veidenbaum, Carlos A. Lopez","doi":"10.1109/ASAP.2003.1212832","DOIUrl":"https://doi.org/10.1109/ASAP.2003.1212832","url":null,"abstract":"The register file is a power-hungry device in modern architectures. Current research on compiler technology and computer architectures encourages the implementation of larger devices to feed multiple data paths and to store global variables. However, low power techniques are not able to appreciably reduce power consumption in this device without a time penalty. We introduce an efficient hardware approach to reduce the register file energy consumption by turning unused registers into a low power state. Bypassing the register fields of the fetch instruction to the decode stage allows the identification of registers required by the current instruction (instruction predecode) and allows the control logic to turn them back on. They are put into the low-power state after the instruction use. This technique achieves an 85% energy reduction with no performance penalty. The simplicity of the approach makes it an effective low-power technique for embedded processors.","PeriodicalId":261592,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122853929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
A floating-point CORDIC based SVD processor 基于浮点CORDIC的SVD处理器
Zhaohui Liu, K. Dickson, J. McCanny
{"title":"A floating-point CORDIC based SVD processor","authors":"Zhaohui Liu, K. Dickson, J. McCanny","doi":"10.1109/ASAP.2003.1212843","DOIUrl":"https://doi.org/10.1109/ASAP.2003.1212843","url":null,"abstract":"An SVD processor system is presented in which each processing element is implemented using a simple CORDIC unit. The internal recursive loop within the CORDIC module is exploited, with pipelining being used to multiplex the two independent microrotations onto a single CORDIC processor. This leads to a high performance and efficient hardware architecture. In addition, a novel method for scale factor correction is presented which only need be applied once at the end of the computation. This also reduces the computation time. The net result is an SVD architecture based on a conventional CORDIC approach, which combines high performance with high silicon area efficiency.","PeriodicalId":261592,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133705745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
A family of parallel-prefix modulo 2/sup n/-1 adders 一组并行前缀模2/sup n/-1加法器
G. Dimitrakopoulos, H. T. Vergos, D. Nikolos, C. Efstathiou
{"title":"A family of parallel-prefix modulo 2/sup n/-1 adders","authors":"G. Dimitrakopoulos, H. T. Vergos, D. Nikolos, C. Efstathiou","doi":"10.1109/ASAP.2003.1212856","DOIUrl":"https://doi.org/10.1109/ASAP.2003.1212856","url":null,"abstract":"We reveal the cyclic nature of idempotency in the case of modulo 2/sup n/-1 addition. Then based on this property, we derive for each n, a family of minimum logic depth modulo 2/sup n/-1 adders, which allows several trade-offs between the number of operators, the internal wire length, and the fanout of internal nodes. Performance data, gathered using static CMOS implementations, reveal that the proposed architectures outperform all previously reported ones in terms of area and/or operation speed.","PeriodicalId":261592,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133580165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Evaluating memory architectures for media applications on coarse-grained reconfigurable architectures 在粗粒度可重构体系结构上评估媒体应用程序的内存体系结构
Jongeun Lee, Kiyoung Choi, N. Dutt
{"title":"Evaluating memory architectures for media applications on coarse-grained reconfigurable architectures","authors":"Jongeun Lee, Kiyoung Choi, N. Dutt","doi":"10.1109/ASAP.2003.1212841","DOIUrl":"https://doi.org/10.1109/ASAP.2003.1212841","url":null,"abstract":"Reconfigurable ALU array (RAA) architectures - representing a popular class of coarse-grained reconfigurable architectures-are gaining in popularity especially for media applications due to their flexibility, regularity, and efficiency. In such architectures, memory is critical not only for configuration data but also for the heavy data traffic required by the application. Hence, system designers would like to evaluate the effects of different memory architectures and memory traffic early in the design process. We offer a scheme for system designers to quickly estimate the performance of media applications on RAA architectures. The proposed scheme is based on the performance-oriented model of RAA architectures, which we develop to model different memory architectures in a uniform way so as to allow for easy mapping of application loops and early performance estimation. Our experimental results estimating the performance of multimedia applications on three memory architectures demonstrate the flexibility of our memory architecture evaluation scheme as well as the varying effects of the memory architectures on the application performance, which also signifies the need for memory architecture evaluation early in the design process.","PeriodicalId":261592,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130673832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Application-specific computing with adaptive register file architectures 具有自适应寄存器文件架构的特定于应用程序的计算
R. Sangireddy, Arun Kumar Somani
{"title":"Application-specific computing with adaptive register file architectures","authors":"R. Sangireddy, Arun Kumar Somani","doi":"10.1109/ASAP.2003.1212842","DOIUrl":"https://doi.org/10.1109/ASAP.2003.1212842","url":null,"abstract":"The demand for higher computing power to effectively execute compute-intensive functions and thus more on-chip computing resources is ever increasing. On the other hand, applications that demand larger on-chip memory bandwidth are continuously emerging. We propose adaptive register file computing (ARC) unit, a novel on-chip processing element that leverages application-specific processing capabilities. The ARC unit supplements a conventional register file to provide large memory bandwidth, or acts as a configurable computing unit to provide higher on-chip computing capacity, depending on the requirement of a specific application. When an out-of-order 8-wide issue superscalar processor is supplemented with the ARC unit to process matrix multiplication, a compute-intensive core function in most multimedia applications, results show a performance increase of up to 12%. Similarly, a 9% performance enhancement is seen when the matrix multiplication is performed in an out-of-order 4-wide issue superscalar processor supplemented with the ARC unit. We also discuss the microarchitecture level details for the implementation of the ARC unit.","PeriodicalId":261592,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123154804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Context-aware process networks 情境感知流程网络
H. W. V. Dijk, H. Sips, E. Deprettere
{"title":"Context-aware process networks","authors":"H. W. V. Dijk, H. Sips, E. Deprettere","doi":"10.1109/ASAP.2003.1212825","DOIUrl":"https://doi.org/10.1109/ASAP.2003.1212825","url":null,"abstract":"In industry, embedded systems for stream-based processing are often modelled and verified by using process networks, such as Kahn process networks. An advantage of Kahn networks is that they allow asynchronous operation of process components in a network. A problem in these networks, however, is that asynchronously interfering events cannot be handled properly because they are intrinsically indeterminate and therefore destroy the compositional properties of the network. We propose to extend the Kahn model of computations with a simple indeterminate construct. We call the resulting network a context-aware process network (CAPN). We show that these networks are capable of handling certain classes of events and can still be reduced to a class of parametrised Kahn networks.","PeriodicalId":261592,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130963269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
A VLSI architecture for advanced video coding motion estimation 一种用于高级视频编码运动估计的VLSI架构
S. Y. Yap, J. McCanny
{"title":"A VLSI architecture for advanced video coding motion estimation","authors":"S. Y. Yap, J. McCanny","doi":"10.1109/ASAP.2003.1212853","DOIUrl":"https://doi.org/10.1109/ASAP.2003.1212853","url":null,"abstract":"With the advent of new video standards such as MPEG-4 part-10 and H.264/H.26L, demands for advanced video coding (AVC), particularly in area of variable block searching motion estimation (VBSME), are increasing. This has led to research into suitable flexible hardware architectures to perform the various types of VBSME. We propose a new 1-D VLSI architecture for full search variable block size motion estimation (FSVBSME). The variable block size, sum of absolute differences (SAD) computation is performed by reusing the results of smaller subblock computations. These are permuted and combined by incorporating a shuffling mechanism within each processing element (PE). Whereas a conventional 1-D architecture can process only one motion vector, this architecture can process up to 41 motion vector (MV) subblocks (within a macroblock) in a comparable number of clock cycles.","PeriodicalId":261592,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124867782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 39
Switched memory architectures - moving beyond systolic arrays 切换内存架构——超越收缩数组
Lakshminarayanan Renganarayanan, S. Rajopadhye
{"title":"Switched memory architectures - moving beyond systolic arrays","authors":"Lakshminarayanan Renganarayanan, S. Rajopadhye","doi":"10.1109/ASAP.2003.1212827","DOIUrl":"https://doi.org/10.1109/ASAP.2003.1212827","url":null,"abstract":"Although current ASIC, FPGA and reconfigurable computing technologies support on-chip memories and hardware reconfiguration, these features are not exploited by systolic arrays and their associated synthesis methods. We propose a new architectural model called switched memory architecture (SMA) to overcome these limitations. SMAS are (strictly) more powerful than systolic arrays, are suitable for a wide range of target technologies, and can be derived through the well developed design methodology of the polyhedral model. We illustrate the power of SMAs by showing how any SARE with a one dimensional schedule can be implemented as an SMA without any slowdown. We formally characterize the class of allocation functions that are suitable for SMAs and also describe a systematic procedure for deriving SMAs from SAREs.","PeriodicalId":261592,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121991478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Using group theory to specify application specific interconnection networks for SIMD DSPs 使用群理论为SIMD dsp指定特定应用的互连网络
Thorsten Dräger, G. Fettweis
{"title":"Using group theory to specify application specific interconnection networks for SIMD DSPs","authors":"Thorsten Dräger, G. Fettweis","doi":"10.1109/ASAP.2003.1212829","DOIUrl":"https://doi.org/10.1109/ASAP.2003.1212829","url":null,"abstract":"We introduce another view of group theory in the field of interconnection networks. With this approach it is possible to specify application specific network topologies for permutation data transfers. Routing of data transfers is generated and all possible permutation data transfers are guaranteed. We present the approach by means of a kind of SIMD DSP.","PeriodicalId":261592,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127244473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信