Yan Xu, Weichen Liu, Yu Wang, Jiang Xu, Xiaoming Chen, Huazhong Yang
{"title":"On-line MPSoC Scheduling Considering Power Gating Induced Power/Ground Noise","authors":"Yan Xu, Weichen Liu, Yu Wang, Jiang Xu, Xiaoming Chen, Huazhong Yang","doi":"10.1109/ISVLSI.2009.54","DOIUrl":"https://doi.org/10.1109/ISVLSI.2009.54","url":null,"abstract":"Power gating induced power/ground(P/G) noise is a major reliability problem facing by low power MPSoCs using power gating techniques. Powering on and off a process unit in MPSoCs will induce large P/G noise and can cause timing divergence and even functional errors in surrounding processing units. P/G noise is different from thermal or energy which is an accumulative effect. The noise level should be predicted and victim circuits should be protected before the noise is induced. hence, the power gating-aware scheduling problem with the consideration of P/G noise should be solved using an on-line method considering the run-time variation of tasks' execution time. In this paper, we formulate an on-line task scheduling problem with the consideration of P/G noise based on our detailed P/G noise analysis platform for MPSoC. An efficient on-line Greedy Heuristic (GH) algorithm that adapts well to real-time variation is proposed to reduce noise protection penalty and improve MPSoC performance. Our experiments show that the algorithm can achieve an average 26% performance improvement together with an average 73% noise protection penalty saving compared with the conservative stop-go method. We also compare our technique with a two-step solution that computes a static schedule at compile time and make adjustment on the schedule according to runtime variations. For benchmark with larger task number, GH method achieves impressive performance improvement comparing with the two-step solution.","PeriodicalId":137508,"journal":{"name":"2009 IEEE Computer Society Annual Symposium on VLSI","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132230358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Lossless Compression Using Efficient Encoding of Bitmasks","authors":"C. Murthy, P. Mishra","doi":"10.1109/ISVLSI.2009.18","DOIUrl":"https://doi.org/10.1109/ISVLSI.2009.18","url":null,"abstract":"Lossless compression is widely used to improve both memory requirement and communication bandwidth in embedded systems. Dictionary based compression techniques are very popular because of their good compression efficiency and fast decompression mechanism. Bitmask based compression improves the effectiveness of the dictionary based approaches by recording minor differences using bitmasks. This paper proposes an efficient encoding of bitmasks used in bitmask-based compression. We prove that a n-bit bitmask (records n differences) can be encoded using only n-1 bits. This encoding improves compression efficiency while reduces decompression hardware overhead. We have applied our approach in a wide a variety of domains including code compression, FPGA bitstream compression as well as control word compression. Our experimental results using a wide variety of benchmarks demonstrate that our approach improves the compression efficiency by 3 to 10% without adding any additional decompression overhead.","PeriodicalId":137508,"journal":{"name":"2009 IEEE Computer Society Annual Symposium on VLSI","volume":"173 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131181085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Inducing Thermal-Awareness in Multicore Systems Using Networks-on-Chip","authors":"David Atienza Alonso, E. Martinez","doi":"10.1109/ISVLSI.2009.25","DOIUrl":"https://doi.org/10.1109/ISVLSI.2009.25","url":null,"abstract":"Technology scaling imposes an ever increasing temperature stress on digital circuit design due to transistor density, especially on highly integrated systems, such as Multi-Processor Systems-on-Chip (MPSoCs). Therefore,temperature-aware design is mandatory and should be performed at the early design stages. In this paper we present a novel hardware infrastructure to provide thermal control of MPSoC architectures, which is based on exploiting the No interconnects of the baseline system as an active component to communicate and coordinate between temperature sensors scattered around the chip, in order to globally monitor the actual temperature. Then, a thermal management unit and clock frequency controllers adjust the frequency and voltage of the processing elements according to the temperature requirements at run-time. We show experimental results of the infrastructure to implement effective global temperature control policies for a real-life 4-core MPSoC,emulated on an FPGA-based emulation framework.","PeriodicalId":137508,"journal":{"name":"2009 IEEE Computer Society Annual Symposium on VLSI","volume":"30 21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123703470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Synchronization-Based Abstraction Refinement for Modular Verification of Asynchronous Designs","authors":"Hao Zheng, Haiqiong Yao, T. Yoneda","doi":"10.1109/ISVLSI.2009.16","DOIUrl":"https://doi.org/10.1109/ISVLSI.2009.16","url":null,"abstract":"This paper presents a modular verification approach for asynchronous circuits to address state explosion with a novel interface refinement method to reduce false counterexamples.This method borrows the idea of parallel composition,and it iteratively refines each component in a design by examining its interface interactions, and removes the behavior not synchronized with its neighbors. This method is further enhanced by synchronizing multiple components simultaneously so that inter-dependencies among components are considered. The experiments on several large asynchronous circuits show that this method efficiently removes impossible behavior from each component including ones violating correctness requirements.","PeriodicalId":137508,"journal":{"name":"2009 IEEE Computer Society Annual Symposium on VLSI","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115248202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"High Speed Parallel Architecture for Cyclic Convolution Based on FNT","authors":"Jian Zhang, Shuguo Li","doi":"10.1109/ISVLSI.2009.10","DOIUrl":"https://doi.org/10.1109/ISVLSI.2009.10","url":null,"abstract":"This paper presents a high speed parallel architecture for cyclic convolution based on Fermat Number Transform (FNT) in the diminished-1 number system. A code conversion method without addition (CCWA) and a butterfly operation method without addition (BOWA) are proposed to perform the FNT and its inverse (IFNT) except their final stages in the convolution. The pointwise multiplication in the convolution is accomplished by modulo 2n+1 partial product multipliers (MPPM) and output partial products which are inputs to the IFNT. Thus modulo 2n+1 carry propagation additions are avoided in the FNT and the IFNT except their final stages and the modulo 2n+1 multiplier. The execution delay of the parallel architecture is reduced evidently due to the decrease of modulo 2n+1 carry-propagation addition. Compared with the existing cyclic convolution architecture, the proposed one has better throughput performance and involves less hardware complexity. Synthesis results using 130nm CMOS technology demonstrate the superiority of the proposed architecture over the reported solution.","PeriodicalId":137508,"journal":{"name":"2009 IEEE Computer Society Annual Symposium on VLSI","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121190033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On-the-Fly Evaluation of FPGA-Based True Random Number Generator","authors":"R. Santoro, O. Sentieys, S. Roy","doi":"10.1109/ISVLSI.2009.33","DOIUrl":"https://doi.org/10.1109/ISVLSI.2009.33","url":null,"abstract":"Many embedded security chips require a high-quality digital True Random Number Generator (TRNG). Recently, some new TRNGs have been proposed in the literature, innovating by their new architectures. Moreover, some of them don't need to use the post-processing unit usually required in TRNG constructions. As a result, the TRNG data rate is enhanced and the produced random bits only depend on the noise source and its sampling. However, selecting a TRNG can be a delicate problem. In a hardware context (e.g. Field-Programmable Gate Array (FPGA) or Application-Specific Integrated Circuit (ASIC) implementation), the design area and power consumption are important criterions. To the best of our knowledge, no effective comparison of several TRNGs appears in the literature. This paper evaluates the randomness behavior, the area and the power consumption of the latest TRNGs. These investigations are realized into real conditions, by implementing the TRNGs into FPGA circuits.","PeriodicalId":137508,"journal":{"name":"2009 IEEE Computer Society Annual Symposium on VLSI","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133360546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhimin Chen, Raghunandan Nagesh, A. Reddy, P. Schaumont
{"title":"Increasing the Sensitivity of On-Chip Digital Thermal Sensors with Pre-Filtering","authors":"Zhimin Chen, Raghunandan Nagesh, A. Reddy, P. Schaumont","doi":"10.1109/ISVLSI.2009.31","DOIUrl":"https://doi.org/10.1109/ISVLSI.2009.31","url":null,"abstract":"Thermal monitoring has been broadly used to protect high-end integrated circuits from over-heating and to identify hot-spots in complex circuits. In this paper, we present a method to increase the sensitivity of an on-chip digital thermal sensor. In contrast to the existing mechanisms that characterize the overall temperature profile on a die, our solution is able to detect the submerged thermal variation caused by specific predefined events (SPE), under the precondition that the SPE’s dominant frequency does not overlap with those of other thermal events. This is made possible by pre-filtering of the temperature value. A demonstrator is implemented in an ordinary FPGA, in which the SPE is a person’s finger touching on the FPGA package. We successfully show that our design can do a correct and reliable detection of the finger touching event while ignoring other larger variations caused by other reasons. Because the finger touching event has no other special characteristics except for its unique frequency, we conclude that our solution is also applicable to other SPEs, especially low-frequency ones. In general, our method is sensitive, reliable and also flexible.","PeriodicalId":137508,"journal":{"name":"2009 IEEE Computer Society Annual Symposium on VLSI","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132563748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alexandre K. I. Mendonça, D. Volpato, José Luís Almada Güntzel, L. Santos
{"title":"Mapping Data and Code into Scratchpads from Relocatable Binaries","authors":"Alexandre K. I. Mendonça, D. Volpato, José Luís Almada Güntzel, L. Santos","doi":"10.1109/ISVLSI.2009.28","DOIUrl":"https://doi.org/10.1109/ISVLSI.2009.28","url":null,"abstract":"Scratchpad memories (SPMs) are promising for energy-efficient embedded systems. Most optimizing techniques for mapping data and code elements to SPMs assume the availability of source code. However, embedded software development has to cope with legacy code, third-party software, and IP-protected applications for which only the binaries are available. The few techniques that directly handle binaries operate on executable files and are limited to either code or data. This work proposes a new technique that addresses both data and code allocation into SPMs. Since it operates directly on binaries, the technique allows library elements to be eligible for SPM mapping. It consists of three main engines: a profiler, a mapper and a patcher. The patcher was designed to operate upon relocatable object binaries so as to overcome the inefficiency of bookkeeping SPM relocations on executable binaries. As compared to code-only SPM mapping, an average energy saving of 15% was obtained for a varied set of benchmark programs and memory configurations. Savings around 47% were reached for the two programs with higher static data content. The average patching time was 0.23s on a quad-core workstation.","PeriodicalId":137508,"journal":{"name":"2009 IEEE Computer Society Annual Symposium on VLSI","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123484776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"High-Speed Low-Current Duobinary Signaling Over Active Terminated Chip-to-Chip Interconnect","authors":"V. Pasupureddi, P. Mandal, Sunil Sachdev","doi":"10.1109/ISVLSI.2009.9","DOIUrl":"https://doi.org/10.1109/ISVLSI.2009.9","url":null,"abstract":"In this work we propose high-speed low-current duobinary signaling scheme over an active terminated chip-to-chip interconnect. The active termination scheme eliminates the need of any dedicated passive terminator both at the transmitter and receiver, avoiding signal reflection. Elimination of the passive terminator helps to reduce the transmitted signal level without effecting signal detect-ability of the receiver and also removes the thermal noise of the terminator. To implement bandwidth efficient duobinary signaling, we present a current-mode high-speed precoder operating at 10-Gb/s. A low-current active terminated driver based on modified Cherry-Hooper topology is proposed. At the receive-end, we propose an active terminated current-mode receiver(Rx) with regulated gate cascode (RGC) based transimpedance amplifier(TIA). Folded active inductor peaking is used to enhance the bandwidth of this TIA. We also propose lowpower broadband equalizer topology for channel equalization. The duobinary transmitter and receiver circuits are implemented in 1.8-V, 0.18-μm Digital CMOS technology with an f_T of 27-GHz. The designed high speed duobinary Tx/Rx circuits work up-to 8-Gb/s speed while transmitting the data over FR4 PCB trace of length 29.5-inch and for the targeted bit-error-rate(BER) of 10^−12. The power consumed in the transmitter and receiver circuits is 42.9-mW at 8-Gb/s","PeriodicalId":137508,"journal":{"name":"2009 IEEE Computer Society Annual Symposium on VLSI","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126483913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An 8-bit 1.8 V 500 MSPS CMOS Segmented Current Steering DAC","authors":"Santanu Sarkar, S. Banerjee","doi":"10.1109/ISVLSI.2009.12","DOIUrl":"https://doi.org/10.1109/ISVLSI.2009.12","url":null,"abstract":"This paper presents design of an 8-bit 1.8 V segmented current steering (CS) digital-to-analog converter (DAC)using 0.18 μm double poly five metal CMOS technology. The DAC has been segmented as 6+2 to achieve optimum performance for minimum area. The simulation result shows a maximum DNLof 0.30 LSB and an INL of 0.33 LSB. The midcode glitch is0.27 pV s. The simulated SNDR and SFDR of the segmented DAC are 52.13 dB and 44.83 dB respectively. The settling of the segmented DAC is 6.02 ns. The power consumption is simulated as 7.88 mW. The prototype will be used in telecommunication applications.","PeriodicalId":137508,"journal":{"name":"2009 IEEE Computer Society Annual Symposium on VLSI","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128182074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}