Koushik Chakraborty, Brennan Cozzens, Sanghamitra Roy, D. Ancajas
{"title":"Efficiently tolerating timing violations in pipelined microprocessors","authors":"Koushik Chakraborty, Brennan Cozzens, Sanghamitra Roy, D. Ancajas","doi":"10.1145/2463209.2488860","DOIUrl":"https://doi.org/10.1145/2463209.2488860","url":null,"abstract":"Early prediction of an upcoming timing violation presents a tremendous opportunity to mask the performance overhead of tolerating these faults. In this paper, we explore several techniques for optimizing instruction scheduling in an Out-of-Order pipeline, exploiting this new perspective in robust system design. Compared to recently proposed stall based techniques for tolerating predictable timing violations, we demonstrate a massive reduction in performance overhead, while supporting correct execution in faulty environments (64-97% across different benchmarks).","PeriodicalId":320207,"journal":{"name":"2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123895919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuxin Wang, Peng Li, Peng Zhang, Chen Zhang, J. Cong
{"title":"Memory partitioning for multidimensional arrays in high-level synthesis","authors":"Yuxin Wang, Peng Li, Peng Zhang, Chen Zhang, J. Cong","doi":"10.1145/2463209.2488748","DOIUrl":"https://doi.org/10.1145/2463209.2488748","url":null,"abstract":"Memory partitioning is widely adopted to efficiently increase the memory bandwidth by using multiple memory banks and reducing data access conflict. Previous methods for memory partitioning mainly focused on one-dimensional arrays. As a consequence, designers must flatten a multidimensional array to fit those methodologies. In this work we propose an automatic memory partitioning scheme for multidimensional arrays based on linear transformation to provide high data throughput of on-chip memories for the loop pipelining in high-level synthesis. An optimal solution based on Ehrhart points counting is presented, and a heuristic solution based on memory padding is proposed to achieve a near optimal solution with a small logic overhead. Compared to the previous one-dimensional partitioning work, the experimental results show that our approach saves up to 21% of block RAMs, 19% in slices, and 46% in DSPs.","PeriodicalId":320207,"journal":{"name":"2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124016694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Iraklis Anagnostopoulos, Vasileios Tsoutsouras, A. Bartzas, D. Soudris
{"title":"Distributed run-time resource management for malleable applications on many-core platforms","authors":"Iraklis Anagnostopoulos, Vasileios Tsoutsouras, A. Bartzas, D. Soudris","doi":"10.1145/2463209.2488942","DOIUrl":"https://doi.org/10.1145/2463209.2488942","url":null,"abstract":"Todays prevalent solutions for modern embedded systems and general computing employ many processing units connected by an on-chip network leaving behind complex superscalar architectures In this paper, we couple the concept of distributed computing with parallel applications and present a workload-aware distributed run-time framework for malleable applications on many-core platforms. The presented framework is responsible for serving in a distributed way and at run-time, the needs of malleable applications, maximizing resource utilization avoiding dominating effects and taking into account the type of processors supporting platform heterogeneity, while having a small overhead in overall inter-core communication. Our framework has been implemented as part of a C simulator and additionally as a runtime service on the Single-Chip Cloud Computer (SCC), an experimental processor created by Intel Labs, and we compared it against a state-of-art run-time resource manager. Experimental results showed that our framework has on average 70% less messages, 64% smaller message size and 20% application speed-up gain.","PeriodicalId":320207,"journal":{"name":"2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"6 11","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114110713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Shafique, Semeen Rehman, Pau Vilimelis Aceituno, J. Henkel
{"title":"Exploiting program-level masking and error propagation for constrained reliability optimization","authors":"M. Shafique, Semeen Rehman, Pau Vilimelis Aceituno, J. Henkel","doi":"10.1145/2463209.2488755","DOIUrl":"https://doi.org/10.1145/2463209.2488755","url":null,"abstract":"Since embedded systems design involves stringent design constraints, designing a system for reliability requires optimization under tolerable overhead constraints. This paper presents a novel reliability-driven compilation scheme for software program reliability optimization under tolerable overhead constraints. Our scheme exploits program-level error masking and propagation properties to perform reliability-driven prioritization of instructions and selective protection during compilation. To enable this, we develop statistical models for estimating error masking and propagation probabilities. Our scheme provides significant improvement in reliability efficiency (avg. 30%-60%) compared to state-of-the-art program-level protection schemes.","PeriodicalId":320207,"journal":{"name":"2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"8 24","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113946490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ABCD-L: Approximating continuous linear systems using Boolean models","authors":"Aadithya V. Karthik, J. Roychowdhury","doi":"10.1145/2463209.2488811","DOIUrl":"https://doi.org/10.1145/2463209.2488811","url":null,"abstract":"In this supplement, we provide additional context for ABCD-L and place our contributions in perspective, relative to the existing body of literature on topics like AMS modelling/verification, Boolean and hybrid systems frameworks, etc. Further, we demonstrate that ABCD-L can be applied in conjunction with Model Order Reduction (MOR) techniques, to Booleanize large LTI systems whose direct eigendecomposition may be computationally infeasible. For example, we combine ABCD-L with Arnoldi iteration based MOR to efficiently produce accurate Boolean models of a real-world power grid network (with 25849 nodes) obtained from a benchmark set made available by IBM. Due to space constraints, we were unable to include such material within our main manuscript.","PeriodicalId":320207,"journal":{"name":"2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124527340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Khawar Shahzad, A. Khalid, Z. Rákossy, G. Paul, A. Chattopadhyay
{"title":"CoARX: A coprocessor for ARX-based cryptographic algorithms","authors":"Khawar Shahzad, A. Khalid, Z. Rákossy, G. Paul, A. Chattopadhyay","doi":"10.1145/2463209.2488898","DOIUrl":"https://doi.org/10.1145/2463209.2488898","url":null,"abstract":"Cryptographic coprocessors are inherent part of modern Systemon-Chips. It serves dual purpose-efficient execution of cryptographic kernels and supporting protocols for preventing IP-piracy. Flexibility in such coprocessors is required to provide protection against emerging cryptanalytic schemes and to support different cryptographic functions like encryption and authentication. In this context, a novel crypto-coprocessor, named CoARX, supporting multiple cryptographic algorithms based on Addition (A), Rotation (R) and eXclusive-or (X) operations is proposed. CoARX supports diverse ARX-based cryptographic primitives. We show that compared to dedicated hardware implementations and general-purpose microprocessors, it offers excellent performance-flexibility trade-off including adaptability to resist generic cryptanalysis.","PeriodicalId":320207,"journal":{"name":"2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134007318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving charging efficiency with workload scheduling in energy harvesting embedded systems","authors":"Yukan Zhang, Yang Ge, Qinru Qiu","doi":"10.1145/2463209.2488803","DOIUrl":"https://doi.org/10.1145/2463209.2488803","url":null,"abstract":"In energy harvesting embedded systems, if the harvested power is sufficient for the workload, extra power will be stored in the electrical energy storage (EES) bank. How much energy can be stored is affected by many factors including the efficiency of the energy harvesting module, the input/output voltage of the DC-DC converters, the status of the EES elements, and the characteristics of the workload. This paper investigates the impact of workload scheduling of the embedded system on the storage efficiency of the EES bank. We first provide an approximated but accurate power consumption model of the DC-DC converter. Based on this model, we analytically prove that an optimal workload schedule is to always execute high power task first. Experimental results confirm that proposed scheduling strategy outperforms all other possible scheduling and increases the amount of stored energy by up to 10.41% in average.","PeriodicalId":320207,"journal":{"name":"2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130975769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hang Lu, Guihai Yan, Yinhe Han, Binzhang Fu, Xiaowei Li
{"title":"RISO: Relaxed network-on-chip isolation for cloud processors","authors":"Hang Lu, Guihai Yan, Yinhe Han, Binzhang Fu, Xiaowei Li","doi":"10.1145/2463209.2488781","DOIUrl":"https://doi.org/10.1145/2463209.2488781","url":null,"abstract":"Cloud service providers use workload consolidation technique in many-core cloud processors to optimize system utilization and augment performance for ever extending scale-out workloads. Performance isolation usually has to be enforced for the consolidated workloads sharing the same many-core resources. Networks-on-chip (NoC) serves as a major shared resource, also needs to be isolated to avoid violating performance isolation. Prior work uses strict network isolation to fulfill performance isolation. However, strict network isolation either results in low consolidation density, or complex routing mechanisms which indicates prohibitive high hardware cost and large latency. In view of this limitation, we propose a novel NoC isolation strategy for many-core cloud processors, called relaxed isolation (RISO). It permits underutilized links to be shared by multiple applications, at the same time keeps the aggregated traffic in check to enforce performance isolation. The experimental results show that the consolidation density is improved more than 12% in comparison with previous strict isolation scheme, meanwhile reducing network latency by 38.4% on average.","PeriodicalId":320207,"journal":{"name":"2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133501083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mona Yousofshahi, M. Orshansky, Kyongbum Lee, S. Hassoun
{"title":"Gene modification identification under flux capacity uncertainty","authors":"Mona Yousofshahi, M. Orshansky, Kyongbum Lee, S. Hassoun","doi":"10.1145/2463209.2488789","DOIUrl":"https://doi.org/10.1145/2463209.2488789","url":null,"abstract":"Re-engineering cellular behavior promises to advance the production of commercially significant biomolecules and to enhance cellular function for many applications. To achieve a desired cellular objective, it is necessary to identify within a metabolic network a set of reactions whose fluxes should be changed using gene modifications. We develop a computational method, CCOpt, to optimize the selection of an intervention set that consists of gene up/down-regulation using uncertainty-aware chance-constrained optimization. In contrast to deterministic approaches where constraints are met with 100% certainty, constraints in CCOpt are probabilistically met at a user-specified confidence level. We investigate the application of CCOpt to two case studies that utilize the Chinese Hamster Ovary (CHO) cell metabolism. Our results demonstrate that CCOpt is capable of identifying optimal intervention sets without the run-time cost of a sampling based (Monte Carlo) approach.","PeriodicalId":320207,"journal":{"name":"2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132566547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Harry Wagstaff, Miles Gould, Björn Franke, N. Topham
{"title":"Early partial evaluation in a JIT-compiled, retargetable instruction set simulator generated from a high-level architecture description","authors":"Harry Wagstaff, Miles Gould, Björn Franke, N. Topham","doi":"10.1145/2463209.2488760","DOIUrl":"https://doi.org/10.1145/2463209.2488760","url":null,"abstract":"Modern processor design tools integrate in their workflows generators for instruction set simulators (Iss) from architecture descriptions. Whilst these generated simulators are useful for design evaluation and software development, they suffer from poor performance. We present an ultra-fast JIT-compiled Iss generated from an ARCHC description. We also introduce a novel partial evaluation optimisation, which further improves JIT compilation time and code quality. This results in a simulation rate of 510MiPs for an ARM target across 45 EEMBC and SPEC benchmarks. On average, our Iss is 1.7 times faster than SIMIT-ARM, one of the fastest Iss generated from an architecture description.","PeriodicalId":320207,"journal":{"name":"2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133239978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}