{"title":"Cluster-level simultaneous multithreading for VLIW processors","authors":"Manoj Gupta, F. Sánchez, J. Llosa","doi":"10.1109/ICCD.2007.4601890","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601890","url":null,"abstract":"Clustered VLIW embedded processors have become widespread due to benefits of simple hardware and low power. However, while some applications exhibit large amounts of instruction level parallelism (ILP) and benefit from very wide machines, others have little ILP, which wastes precious resources in wide processors. Simultaneous multithreading (SMT) is a well known technique that improves resource utilization by exploiting thread level parallelism at the instruction grain level. However, implementing SMT for VLIWs requires complex structures. In this paper, we propose CSMT (cluster-level simultaneous multithreading) to allow some degree of SMT in clustered VLIW processors with minimal hardware cost and complexity. CSMT considers the set of operations that execute simultaneously in a given cluster (named bundle) as the assignment unit. All bundles belonging to a VLIW instruction from a given thread are issued simultaneously. To minimize cluster conflicts between threads, a very simple hardware- based cluster renaming mechanism is proposed. The experimental results show that CSMT significantly improves ILP when compared with other multithreading approaches suited for VLIW. For instance, with 4 threads CSMT shows an average speedup of 113% over a single-thread VLIW architecture and 36% over interleaved multithreading (IMT). In some cases, speedup can be as high as 228% over single thread architecture and 97% over IMT.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"29 1","pages":"121-128"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87082564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FPGA global routing architecture optimization using a multicommodity flow approach","authors":"Yuanfang Hu, Yi Zhu, M. Taylor, Chung-Kuan Cheng","doi":"10.1109/ICCD.2007.4601893","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601893","url":null,"abstract":"Low energy and small switch area usage are two of the important design objectives in FPGA global routing architecture design. This paper presents an improved MCF model based CAD flow that performs aggressive optimizations, such as topology and wire style optimizations, to reduce the energy and switch area of FPGA global routing architectures. The experiments show that when compared to traditional mesh architecture, the optimized FPGA routing architectures achieve up to 10% to 15% energy savings and up to 20% switch area savings in average for a set of seven benchmark circuits.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"1 1","pages":"144-151"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81329754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On modeling impact of sub-wavelength lithography on transistors","authors":"Aswin Sreedhar, S. Kundu","doi":"10.1109/ICCD.2007.4601884","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601884","url":null,"abstract":"As the VLSI technology marches beyond 65 and 45 nm process technologies, variation in gate length has a direct impact on leakage and performance of CMOS transistors. Due to sub-wavelength lithography, the shape of the transistor often differs from idealized rectangles. In silicon, the effective channel length of a transistor varies across its width. This is a modeling problem. The average effective channel length is different for ON current and OFF currents, making it difficult, if not impossible for a single Leff to accurately represent both. In this paper, we report an accurate post-litho non-rectangular transistor modeling methodology. We further studied the impact of focus and dose variations in lithographic process on transistor parameters. The resulting transistor models were applied for standard cell characterization in successive steps of lithographic simulation of layout and device characterization. Results show that the new models can improve the accuracy of estimation of leakage current by 40% or more over a nominal model that is primarily tuned for ON current.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"94 1","pages":"84-90"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81706030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A power gating scheme for ground bounce reduction during mode transition","authors":"Ku He, Rong Luo, Yu Wang","doi":"10.1109/ICCD.2007.4601929","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601929","url":null,"abstract":"Power gating is an effective method to reduce leakage power during the circuit sleep mode; however, it introduces the ground bounce problem and has considerable energy consumption during the mode transitions. To mitigate the ground bounce, we propose a novel power gating scheme that reduces the magnitude of the peak current and voltage glitches as well as the time to stabilize power and ground during mode transitions. To further decrease the wakeup time while keep the energy efficiency, we introduce two improved circuit schemes with two intermediate states, based on our proposed power gating scheme. The scheme provides an average peak voltage reduction of 67.0%, and the wakeup time reduction is up to 62.3%. If the circuits use the intermediate schemes, wakeup time can be further reduced by a maximum of 95.7%. Beside these reductions, our proposed circuit scheme also has the advantage of small size and flexible controllability.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"17 1","pages":"388-394"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85060020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Non-arithmetic carry chains for reconfigurable fabrics","authors":"Michael T. Frederick, Arun Kumar Somani","doi":"10.1109/ICCD.2007.4601892","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601892","url":null,"abstract":"Reconfigurable fabrics cater to a wide variety of applications, but have adopted specialized components to allow efficient implementation of performance-critical arithmetic operations. Carry chains have been integrated into the fabric typically as an optimized ripple-carry chain. However, in non-arithmetic operations the carry chain goes unused, when it could be a valuable adjacent-cell interconnect resource. This paper presents a cell architecture facilitating reuse, as well as an analysis of the potential benefits of reuse for an sampling of common of algorithms using commercial FPGAs. Technology map experiments indicate that a variety of applications can benefit from reuse, with utilized routing resources reduced by up to 13% and maximum clock frequency increased by up to 47%.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"45 1","pages":"137-143"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88070792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Challenges and prospects of SDR for mobile phones","authors":"U. Ramacher","doi":"10.1109/ICCD.2007.4601903","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601903","url":null,"abstract":"A nonvolatile semiconductor memory apparatus is provided which comprises a flip-flop circuit formed of a pair of MOS FETs and a pair of MNOS FETs coupled to the bistable output terminals of the flip-flop circuit, respectively. The memory apparatus further has a pair of MOS FETs coupled to have the current paths in parallel with the current paths of the pair of MOS FETs of the flip-flop circuit.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"290 1","pages":"215-215"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77155028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Evaluating voltage islands in CMPs under process variations","authors":"Abhishek Das, S. Ozdemir, G. Memik, A. Choudhary","doi":"10.1109/ICCD.2007.4601891","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601891","url":null,"abstract":"Parameter variations are a major factor causing power-performance asymmetry in chip multiprocessors. In this paper, we analyze the effects of with-in-die (WID) process variations on chip multicore processors and then apply a variable voltage island scheme to minimize power dissipation. Our idea is based on the observation that due to process variations, the critical paths in each core are likely to have a different latencies resulting in core-to-core (C2C) variations. As a result, each core can operate correctly under different supply voltage levels, achieving an optimal power consumption level. Particularly, we analyze voltage islands at different granularities ranging from a single core to a group of cores. We show that the dynamic power consumption can be reduced by up to 36.2% when each core can set its individual supply voltage level. In addition, for most manufacturing technologies, significant power savings can be achieved with only a few voltage islands on the whole chip: a single customized voltage setting can reduce the power consumption by up to 31.5%. Since the nominal operating frequency remains unchanged after the modifications, our scheme incurs no performance overhead.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"38 3 1","pages":"129-136"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79388788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Algorithms to simplify multi-clock/edge timing constraints","authors":"V. Nagbhushan, C. Y. Chen","doi":"10.1109/ICCD.2007.4601937","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601937","url":null,"abstract":"The use of multiple clocks has become a common practice in modern microprocessor design. With multiple clocks, the timing specifications have become complicated and tend to go beyond the ability of single-clock based CAD tools. This paper first introduces the concept of timing specification transformation. Then, this paper describes algorithms for transforming an interface timing specification with multiple clocks/edges into an equivalent specification with a single clock/edge for combinational circuit blocks. It formulates a new optimization problem, which is important but has never been addressed by CAD researchers. It identifies conditions under which this transformation can be performed efficiently without any loss of timing budget. The algorithm can be used to simplify the constraints to drive many synthesis and optimization algorithms.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"45 1","pages":"444-449"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74328714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Inchoon Yeo, Heung-Ki Lee, Eun Jung Kim, K. H. Yum
{"title":"Effective Dynamic Thermal Management for MPEG-4 decoding","authors":"Inchoon Yeo, Heung-Ki Lee, Eun Jung Kim, K. H. Yum","doi":"10.1109/ICCD.2007.4601962","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601962","url":null,"abstract":"This paper proposes dynamic thermal management (DTM) based on a dynamic voltage and frequency scaling (DVFS) technique for MPEG-4 decoding to guarantee thermal safety while maintaining a quality of service (QoS) constraint. Although many low-power and low-temperature multimedia playback techniques have been proposed, most of them are impractical in real-time and have several restricting assumptions. Multimedia data consists of several frames requiring different decoding efforts. Since both temperature and performance of a multimedia system are affected by the complexity of scenes, our main idea is to use the information on scene complexity to find an appropriate frequency. In order to predict the complexity of the current scene, we extract information from the previous group of pictures (GOP) using feedback control with a display buffer. Experimental results with twelve movies show that our DTM scheme guarantees the threshold of temperature (70degC) while maintaining 0% frame miss ratio. Also, our DTM scheme decreases the average temperature by up to 13% without any additional hardware and playback latency.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"20 1","pages":"623-628"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75299153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"VOSCH: Voltage scaled cache hierarchies","authors":"W. Wong, Cheng-Kok Koh, Yiran Chen, Hai Helen Li","doi":"10.1109/ICCD.2007.4601944","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601944","url":null,"abstract":"The cache hierarchy of state-of-the-art - especially multicore - microprocessors consumes a significant amount of area and energy. A significant amount of research has been devoted especially to reducing the latter. One of the most important microarchitectural techniques proposed for the energy management is dynamic voltage scaling (DVS). In DVS solutions, each cache operates at a number of different voltages. Most of the research in DVS techniques have been around how the voltages can be adjusted and tuned. In this paper, we depart from the use of DVS for energy conservation by examining static voltage assignments for caches. We propose the use of voltage scaled cache hierarchies (VOSCH) as a means to conserve both static and dynamic energy. In VOSCH, the caches are powered at progressively lower supply voltages as the cache level increases. Compared to DVS solutions, VOSCH is simple, potentially more robust and can conserve more energy. We also experimented with more aggressive designs that included the addition of small cache structures to VOSCH. Even greater energy savings were achieved without having to sacrifice performance.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"164 1","pages":"496-503"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75339104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}