{"title":"A generic implementation of a quantified predictor on FPGAs","authors":"G. Thomas, A. Elhossini, B. Juurlink","doi":"10.1145/2591513.2591517","DOIUrl":"https://doi.org/10.1145/2591513.2591517","url":null,"abstract":"Predictors are used in many fields of computer architectures to enhance performance. With good estimations of future system behaviour, policies can be developed to improve system performance or reduce power consumption. These policies become more effective if the predictors are implemented in hardware and can provide quantified forecasts and not only binary ones. In this paper, we present and evaluate a generic predictor implemented in VHDL running on an FPGA which produces quantified forecasts. Moreover, a complete scalability analysis is presented which shows that our implementation has a maximum device utilization of less than 5%. Furthermore, we analyse the power consumption of the predictor running on an FPGA. Additionally, we show that this implementation can be clocked by over 210 MHz. Finally, we evaluate a power-saving policy based on our hardware predictor. Based on predicted idle periods, this power-saving policy uses power-saving modes and is able to reduce memory power consumption by 14.3%.","PeriodicalId":272619,"journal":{"name":"ACM Great Lakes Symposium on VLSI","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115274740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"WeDBless: weighted deflection bufferless router for mesh NoCs","authors":"Simi Zerine Sleeba, John Jose, M. G. Mini","doi":"10.1145/2591513.2591559","DOIUrl":"https://doi.org/10.1145/2591513.2591559","url":null,"abstract":"Bufferless NoC routers employing deflection routing are gaining popularity due to their power and area efficiency. We propose WeDBless, a bufferless deflection router that reduces deflection rate of flits by employing port allocation based on weighted deflection of flits. The proposed method directs the frequently misrouted flits towards their destination by increasing their probability of getting a productive output port. Our evaluations on synthetic traffic patterns show that WeDBless achieves significant reduction in deflection rate, average flit latency and improvement in network saturation point compared to the state-of-the-art bufferless router and reduced complexity in route computing logic.","PeriodicalId":272619,"journal":{"name":"ACM Great Lakes Symposium on VLSI","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116742907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A novel mixed-signal self-calibration technique for baseband filters in systems-on-chip mobile transceivers","authors":"Yongsuk Choi, Yong-Bin Kim","doi":"10.1145/2591513.2591522","DOIUrl":"https://doi.org/10.1145/2591513.2591522","url":null,"abstract":"This paper presents a novel digitally-assisted automatic frequency tuning technique, and the self calibration technique is verified for a 130nm CMOS 4th order biquad baseband low-pass filter case with 20MHz cut-off frequency, which satisfies the typical LTE receiver specifications. The proposed tuning method includes hardware reduction methods, coherent sampling, and magnitude calculator using \"alpha max plus beta min\" algorithm for significant chip area reduction with negligible accuracy degradation. The cut-off frequency turns out to be tunable in the range of 16.2MHz to 24.4MHz, and the tuning error is less than 0.4% over the whole frequency tuning range. The estimated area consumption is 0.027mm2 with 80% device density, and power dissipation is 0.16mW at 128MHz clock speed with a 1.2V supply voltage.","PeriodicalId":272619,"journal":{"name":"ACM Great Lakes Symposium on VLSI","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124812799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
F. Pereira, A. Soares, A. Susin, A. Bonatto, M. Negreiros
{"title":"H.264 8x8 inverse transform architecture optimization","authors":"F. Pereira, A. Soares, A. Susin, A. Bonatto, M. Negreiros","doi":"10.1145/2591513.2591564","DOIUrl":"https://doi.org/10.1145/2591513.2591564","url":null,"abstract":"This paper presents a resource optimized hardware solution to perform the H.264 8x8 inverse transform. Row/column decomposition is used, arithmetic units are re-used and the transpose memory is replaced by a shift register. The architecture is able to perform 8x8 integer transform calculation in 144 cycles with as few as 431 LUTs on a Xilinx virtex 6 FPGA for 16-bit resolution. To enable the module to process all inverse transforms in H.264, the number of LUTs is increased to 681. When used to calculate all transforms for H.264 videos, the design supports resolutions up to 1280x720@30fps when running at 84 MHz.","PeriodicalId":272619,"journal":{"name":"ACM Great Lakes Symposium on VLSI","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125103776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FPGA based implementation of a genetic algorithm for ARMA model parameters identification","authors":"H. Merabti, D. Massicotte","doi":"10.1145/2591513.2591579","DOIUrl":"https://doi.org/10.1145/2591513.2591579","url":null,"abstract":"In this paper, we propose an FPGA implementation of a genetic algorithm (GA) for linear and nonlinear auto regressive moving average (ARMA) model parameters identification. The GA features specifically designed genetic operators for adaptive filtering applications. The design was implemented using very low bit-wordlength fixed-point representation, where only 6-bit wordlength arithmetic was used. The implementation experiments show high parameters identification capabilities and low footprint.","PeriodicalId":272619,"journal":{"name":"ACM Great Lakes Symposium on VLSI","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124534332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marta Ortín-Obón, L. Ramini, H. Tatenguem, V. Viñals, D. Bertozzi
{"title":"A complete electronic network interface architecture for global contention-free communication over emerging optical networks-on-chip","authors":"Marta Ortín-Obón, L. Ramini, H. Tatenguem, V. Viñals, D. Bertozzi","doi":"10.1145/2591513.2591536","DOIUrl":"https://doi.org/10.1145/2591513.2591536","url":null,"abstract":"Although many valuable research works have investigated the properties of optical networks-on-chip (ONoCs), the vast majority of them lack an accurate exploration of the network interface architecture (NI) required to support optical communications on the silicon chip. The complexity of this architecture is especially critical for a specific kind of ONoCs: wavelength-routed ones. From a logical viewpoint, they can be considered as full nonblocking crossbars, thus the control complexity is implemented at the NIs. To our knowledge, this paper proposes the first complete NI architecture for wavelength-routed optical NoCs, by coping with the intricacy of networking issues such as flow control, buffering strategy, deadlock avoidance, serialization, and above all, with their codesign in a complete architecture.","PeriodicalId":272619,"journal":{"name":"ACM Great Lakes Symposium on VLSI","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114319873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"VLSI implementation of linear MIMO detection with boosted communications performance: extended abstract","authors":"Dominik Auras, D. Rieth, R. Leupers, G. Ascheid","doi":"10.1145/2591513.2591551","DOIUrl":"https://doi.org/10.1145/2591513.2591551","url":null,"abstract":"A novel class of linear soft-input soft-output detectors featuring boosted communications performance is introduced. Compared to state-of-the-art linear detectors, the detector has an SNR gain of up to 2.4 dB. We shortly summarize the algorithm, and sketch a suitable architecture. The corresponding ASIC implementation shows the feasibility and efficiency of the concept. It achieves the IEEE 802.11n standard's peak data rate of 600 Mbit/s.","PeriodicalId":272619,"journal":{"name":"ACM Great Lakes Symposium on VLSI","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128328407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Thermal-aware phase-based tuning of embedded systems","authors":"Tosiron Adegbija, A. Gordon-Ross","doi":"10.1145/2591513.2591586","DOIUrl":"https://doi.org/10.1145/2591513.2591586","url":null,"abstract":"Due to embedded systems' stringent design constraints, much prior work focused on optimizing energy consumption and/or performance. However, since embedded systems have fewer cooling options, rising temperature, and thus temperature optimization, is an emergent concern. We present thermal-aware phase-based tuning--TaPT--that determines Pareto optimal configurations for fine-grained execution time, energy, and temperature tradeoffs. Results show that TaPT reduces execution time, energy, and temperature by as much as 5%, 30%, and 25%, respectively, while adhering to designer-specified design constraints.","PeriodicalId":272619,"journal":{"name":"ACM Great Lakes Symposium on VLSI","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128854642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cong Xu, Dimin Niu, Yang Zheng, Shimeng Yu, Yuan Xie
{"title":"Reliability-aware cross-point resistive memory design","authors":"Cong Xu, Dimin Niu, Yang Zheng, Shimeng Yu, Yuan Xie","doi":"10.1145/2591513.2591528","DOIUrl":"https://doi.org/10.1145/2591513.2591528","url":null,"abstract":"The transition metal oxide (TMO) resistive random access memory (ReRAM) has been identified as one of the most promising candidates for the next generation non-volatile memory (NVM) technology. Numerous TMO ReRAMs with different materials have been developed and demonstrate attractive characteristics, such as fast read/write speed, low power consumption, high integrated density, and good scalability. Among them, the most attractive characteristic of ReRAM is its cross-point structure which features a 4F2 cell size. However, the existence of sneak current and voltage drop along the wire resistance in a cross-point array brings in extra design challenges. In addition, a robust ReRAM design needs to deal with both soft and hard errors. In this paper, we summarize mechanisms of both soft and hard errors of ReRAM cells and propose a unified model to characterize different failure behaviors. We quantitatively analyze the impact of cell failure modes on the reliability of cross-point array. We also propose an error resilient architecture which avoids unnecessary writes in the hard error detection unit. Experimental results show that our design can extend the lifetime of ReRAM up to 75% over the design without hard error detections and up to 12% over the design with \"write-verify\" detection mechanism.","PeriodicalId":272619,"journal":{"name":"ACM Great Lakes Symposium on VLSI","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130120476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Z. Wang, Chao Chen, Piyush Sharma, A. Chattopadhyay
{"title":"System-level reliability exploration framework for heterogeneous MPSoC","authors":"Z. Wang, Chao Chen, Piyush Sharma, A. Chattopadhyay","doi":"10.1145/2591513.2591519","DOIUrl":"https://doi.org/10.1145/2591513.2591519","url":null,"abstract":"Power density of digital circuits increased at alarming rate for deep sub-micron CMOS technology, turning reliability into a serious design concern. On the other hand, ever-growing task complexity with strict performance budget forced designers to adopt complex, heterogeneous MPSoCs as the implementation choice. Several commercial system-level design platforms exist currently for design, exploration and implementation of MPSoC. In this paper, we propose a system-level reliability exploration framework by extending a commercial system-level design flow. Using this framework, a heterogeneous MPSoC is designed which can accept a custom mapping algorithm based on the MPSoC topology before the actual task deployment. The dynamic reliability-aware task management is able to consider the desired reliability constraints of tasks as well as reliability levels of the system components. We report our experimental findings using state-of-the-art benchmark applications.","PeriodicalId":272619,"journal":{"name":"ACM Great Lakes Symposium on VLSI","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115574302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}