Abhishek Singh, D. Phatak, T. Goff, Mike Riggs, J. Plusquellic, C. Patel
{"title":"Comparison of branching CORDIC implementations","authors":"Abhishek Singh, D. Phatak, T. Goff, Mike Riggs, J. Plusquellic, C. Patel","doi":"10.1109/ASAP.2003.1212845","DOIUrl":"https://doi.org/10.1109/ASAP.2003.1212845","url":null,"abstract":"We compare implementations of Duprat and Muller's branching CORDIC and Phatak's double step branching (DSB)-CORDIC algorithms for sine and cosine evaluation. For reference we also report on classical CORDIC implementations for the same wordlengths. We have also implemented double stepping in the classical algorithm and report on the performance of this method. CORDIC evaluation of sine and cosine includes two parts, the zeroer and the rotator. We discuss implementation issues related to the minimization of the delay of each iteration of the algorithm (including delays for both the zeroer as well the rotator). We then examine hybrid methods that select the components from different algorithms (such as a DSB zeroer together with a classical rotator or vice versa).","PeriodicalId":261592,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123006769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Ayala, M. López-Vallejo, A. Veidenbaum, Carlos A. Lopez
{"title":"Energy aware register file implementation through instruction predecode","authors":"J. Ayala, M. López-Vallejo, A. Veidenbaum, Carlos A. Lopez","doi":"10.1109/ASAP.2003.1212832","DOIUrl":"https://doi.org/10.1109/ASAP.2003.1212832","url":null,"abstract":"The register file is a power-hungry device in modern architectures. Current research on compiler technology and computer architectures encourages the implementation of larger devices to feed multiple data paths and to store global variables. However, low power techniques are not able to appreciably reduce power consumption in this device without a time penalty. We introduce an efficient hardware approach to reduce the register file energy consumption by turning unused registers into a low power state. Bypassing the register fields of the fetch instruction to the decode stage allows the identification of registers required by the current instruction (instruction predecode) and allows the control logic to turn them back on. They are put into the low-power state after the instruction use. This technique achieves an 85% energy reduction with no performance penalty. The simplicity of the approach makes it an effective low-power technique for embedded processors.","PeriodicalId":261592,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122853929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A floating-point CORDIC based SVD processor","authors":"Zhaohui Liu, K. Dickson, J. McCanny","doi":"10.1109/ASAP.2003.1212843","DOIUrl":"https://doi.org/10.1109/ASAP.2003.1212843","url":null,"abstract":"An SVD processor system is presented in which each processing element is implemented using a simple CORDIC unit. The internal recursive loop within the CORDIC module is exploited, with pipelining being used to multiplex the two independent microrotations onto a single CORDIC processor. This leads to a high performance and efficient hardware architecture. In addition, a novel method for scale factor correction is presented which only need be applied once at the end of the computation. This also reduces the computation time. The net result is an SVD architecture based on a conventional CORDIC approach, which combines high performance with high silicon area efficiency.","PeriodicalId":261592,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133705745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
G. Dimitrakopoulos, H. T. Vergos, D. Nikolos, C. Efstathiou
{"title":"A family of parallel-prefix modulo 2/sup n/-1 adders","authors":"G. Dimitrakopoulos, H. T. Vergos, D. Nikolos, C. Efstathiou","doi":"10.1109/ASAP.2003.1212856","DOIUrl":"https://doi.org/10.1109/ASAP.2003.1212856","url":null,"abstract":"We reveal the cyclic nature of idempotency in the case of modulo 2/sup n/-1 addition. Then based on this property, we derive for each n, a family of minimum logic depth modulo 2/sup n/-1 adders, which allows several trade-offs between the number of operators, the internal wire length, and the fanout of internal nodes. Performance data, gathered using static CMOS implementations, reveal that the proposed architectures outperform all previously reported ones in terms of area and/or operation speed.","PeriodicalId":261592,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133580165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Evaluating memory architectures for media applications on coarse-grained reconfigurable architectures","authors":"Jongeun Lee, Kiyoung Choi, N. Dutt","doi":"10.1109/ASAP.2003.1212841","DOIUrl":"https://doi.org/10.1109/ASAP.2003.1212841","url":null,"abstract":"Reconfigurable ALU array (RAA) architectures - representing a popular class of coarse-grained reconfigurable architectures-are gaining in popularity especially for media applications due to their flexibility, regularity, and efficiency. In such architectures, memory is critical not only for configuration data but also for the heavy data traffic required by the application. Hence, system designers would like to evaluate the effects of different memory architectures and memory traffic early in the design process. We offer a scheme for system designers to quickly estimate the performance of media applications on RAA architectures. The proposed scheme is based on the performance-oriented model of RAA architectures, which we develop to model different memory architectures in a uniform way so as to allow for easy mapping of application loops and early performance estimation. Our experimental results estimating the performance of multimedia applications on three memory architectures demonstrate the flexibility of our memory architecture evaluation scheme as well as the varying effects of the memory architectures on the application performance, which also signifies the need for memory architecture evaluation early in the design process.","PeriodicalId":261592,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130673832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Application-specific computing with adaptive register file architectures","authors":"R. Sangireddy, Arun Kumar Somani","doi":"10.1109/ASAP.2003.1212842","DOIUrl":"https://doi.org/10.1109/ASAP.2003.1212842","url":null,"abstract":"The demand for higher computing power to effectively execute compute-intensive functions and thus more on-chip computing resources is ever increasing. On the other hand, applications that demand larger on-chip memory bandwidth are continuously emerging. We propose adaptive register file computing (ARC) unit, a novel on-chip processing element that leverages application-specific processing capabilities. The ARC unit supplements a conventional register file to provide large memory bandwidth, or acts as a configurable computing unit to provide higher on-chip computing capacity, depending on the requirement of a specific application. When an out-of-order 8-wide issue superscalar processor is supplemented with the ARC unit to process matrix multiplication, a compute-intensive core function in most multimedia applications, results show a performance increase of up to 12%. Similarly, a 9% performance enhancement is seen when the matrix multiplication is performed in an out-of-order 4-wide issue superscalar processor supplemented with the ARC unit. We also discuss the microarchitecture level details for the implementation of the ARC unit.","PeriodicalId":261592,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123154804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Context-aware process networks","authors":"H. W. V. Dijk, H. Sips, E. Deprettere","doi":"10.1109/ASAP.2003.1212825","DOIUrl":"https://doi.org/10.1109/ASAP.2003.1212825","url":null,"abstract":"In industry, embedded systems for stream-based processing are often modelled and verified by using process networks, such as Kahn process networks. An advantage of Kahn networks is that they allow asynchronous operation of process components in a network. A problem in these networks, however, is that asynchronously interfering events cannot be handled properly because they are intrinsically indeterminate and therefore destroy the compositional properties of the network. We propose to extend the Kahn model of computations with a simple indeterminate construct. We call the resulting network a context-aware process network (CAPN). We show that these networks are capable of handling certain classes of events and can still be reduced to a class of parametrised Kahn networks.","PeriodicalId":261592,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130963269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A VLSI architecture for advanced video coding motion estimation","authors":"S. Y. Yap, J. McCanny","doi":"10.1109/ASAP.2003.1212853","DOIUrl":"https://doi.org/10.1109/ASAP.2003.1212853","url":null,"abstract":"With the advent of new video standards such as MPEG-4 part-10 and H.264/H.26L, demands for advanced video coding (AVC), particularly in area of variable block searching motion estimation (VBSME), are increasing. This has led to research into suitable flexible hardware architectures to perform the various types of VBSME. We propose a new 1-D VLSI architecture for full search variable block size motion estimation (FSVBSME). The variable block size, sum of absolute differences (SAD) computation is performed by reusing the results of smaller subblock computations. These are permuted and combined by incorporating a shuffling mechanism within each processing element (PE). Whereas a conventional 1-D architecture can process only one motion vector, this architecture can process up to 41 motion vector (MV) subblocks (within a macroblock) in a comparable number of clock cycles.","PeriodicalId":261592,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124867782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Switched memory architectures - moving beyond systolic arrays","authors":"Lakshminarayanan Renganarayanan, S. Rajopadhye","doi":"10.1109/ASAP.2003.1212827","DOIUrl":"https://doi.org/10.1109/ASAP.2003.1212827","url":null,"abstract":"Although current ASIC, FPGA and reconfigurable computing technologies support on-chip memories and hardware reconfiguration, these features are not exploited by systolic arrays and their associated synthesis methods. We propose a new architectural model called switched memory architecture (SMA) to overcome these limitations. SMAS are (strictly) more powerful than systolic arrays, are suitable for a wide range of target technologies, and can be derived through the well developed design methodology of the polyhedral model. We illustrate the power of SMAs by showing how any SARE with a one dimensional schedule can be implemented as an SMA without any slowdown. We formally characterize the class of allocation functions that are suitable for SMAs and also describe a systematic procedure for deriving SMAs from SAREs.","PeriodicalId":261592,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121991478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using group theory to specify application specific interconnection networks for SIMD DSPs","authors":"Thorsten Dräger, G. Fettweis","doi":"10.1109/ASAP.2003.1212829","DOIUrl":"https://doi.org/10.1109/ASAP.2003.1212829","url":null,"abstract":"We introduce another view of group theory in the field of interconnection networks. With this approach it is possible to specify application specific network topologies for permutation data transfers. Routing of data transfers is generated and all possible permutation data transfers are guaranteed. We present the approach by means of a kind of SIMD DSP.","PeriodicalId":261592,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127244473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}