Yatish Turakhia, B. Raghunathan, S. Garg, Diana Marculescu
{"title":"HaDeS: Architectural synthesis for heterogeneous dark silicon chip multi-processors","authors":"Yatish Turakhia, B. Raghunathan, S. Garg, Diana Marculescu","doi":"10.1145/2463209.2488948","DOIUrl":"https://doi.org/10.1145/2463209.2488948","url":null,"abstract":"In this paper, we propose an efficient iterative optimization based approach for architectural synthesis of dark silicon heterogeneous chip multi-processors (CMPs). The goal is to determine the optimal number of cores of each type to provision the CMP with, such that the area and power budgets are met and the application performance is maximized. We consider general-purpose multi-threaded applications with a varying degree of parallelism (DOP) that can be set at run-time, and propose an accurate analytical model to predict the execution time of such applications on heterogeneous CMPs. Our experimental results illustrate that the synthesized heterogeneous dark silicon CMPs provide between 19% to 60% performance improvements over conventional homogeneous designs for variable and fixed DOP scenarios, respectively.","PeriodicalId":320207,"journal":{"name":"2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126589485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sidharta Andalam, A. Girault, R. Sinha, P. Roop, J. Reineke
{"title":"Precise timing analysis for direct-mapped caches","authors":"Sidharta Andalam, A. Girault, R. Sinha, P. Roop, J. Reineke","doi":"10.1145/2463209.2488917","DOIUrl":"https://doi.org/10.1145/2463209.2488917","url":null,"abstract":"Safety-critical systems require guarantees on their worst-case execution times. This requires modelling of speculative hardware features such as caches that are tailored to improve the average-case performance, while ignoring the worst case, which complicates the Worst Case Execution Time (WCET) analysis problem. Existing approaches that precisely compute WCET suffer from state-space explosion. In this paper, we present a novel cache analysis technique for direct-mapped instruction caches with the same precision as the most precise techniques, while improving analysis time by up to 240 times. This improvement is achieved by analysing individual control points separately, and carrying out optimisations that are not possible with existing techniques.","PeriodicalId":320207,"journal":{"name":"2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125513219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hyungmin Cho, S. Mirkhani, Chen-Yong Cher, J. Abraham, S. Mitra
{"title":"Quantitative evaluation of soft error injection techniques for robust system design","authors":"Hyungmin Cho, S. Mirkhani, Chen-Yong Cher, J. Abraham, S. Mitra","doi":"10.1145/2463209.2488859","DOIUrl":"https://doi.org/10.1145/2463209.2488859","url":null,"abstract":"Choosing the correct error injection technique is of primary importance in simulation-based design and evaluation of robust systems that are resilient to soft errors. Many low-level (e.g., flip-flop-level) error injection techniques are generally used for small systems due to long execution times and significant memory requirements. High-level error injections at the architecture or memory levels are generally fast but can be inaccurate. Unfortunately, there exists very little research literature on quantitative analysis of the inaccuracies associated with high-level error injection techniques. In this paper, we use simulation and emulation results to understand the accuracy tradeoffs associated with a variety of high-level error injection techniques. A detailed analysis of error propagation explains the causes of high degrees of inaccuracies associated with error injection techniques at higher levels of abstraction.","PeriodicalId":320207,"journal":{"name":"2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126859354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Proactive circuit allocation in multiplane NoCs","authors":"Ahmed Abousamra, A. Jones, R. Melhem","doi":"10.1145/2463209.2488778","DOIUrl":"https://doi.org/10.1145/2463209.2488778","url":null,"abstract":"This work explores a method for efficient pre-allocation of circuits in network-on-chip (NoC) to reduce communication latency and improve performance. Circuit pre-allocation eliminates the time cost of circuit establishment by using request messages to reserve the circuits for their anticipated reply messages. Requests reserve circuits in a priority order rather than for a particular time slot, avoiding delays or blocking even if the newly requested circuits conflict with previously reserved ones. Benchmark simulations show speedup in execution time of up to 16%, with an average of 8% for communication sensitive benchmarks, over a leading proposal in pre-configuring circuits.","PeriodicalId":320207,"journal":{"name":"2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"164 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127527618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Path to a TeraByte of on-chip memory for petabit per second bandwidth with < 5Watts of power","authors":"Swaroop Ghosh","doi":"10.1145/2463209.2488913","DOIUrl":"https://doi.org/10.1145/2463209.2488913","url":null,"abstract":"We propose a path to achieve an ambitious target that has never been tried before: a terabyte of on-chip memory for petabit/second of bandwidth with <; 5W of power. Conventional methodology of on-chip memory design is bottom up where the choice of bitcell topology and associated peripherals are predetermined. The resulting memory is sub-optimal and often suffers from high power and poor bandwidth. We approach this problem from top down where the capacity, bandwidth and power specifications guide the choice of bitcell. Our evaluation shows that domain wall memory (DWM) can be a potential technology that can meet TB capacity and Pb/s bandwidth with shoestring power budget.","PeriodicalId":320207,"journal":{"name":"2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129026129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A field-programmable pin-constrained digital microfluidic biochip","authors":"D. Grissom, P. Brisk","doi":"10.1145/2463209.2488790","DOIUrl":"https://doi.org/10.1145/2463209.2488790","url":null,"abstract":"As digital microfluidic biochips (DMFBs) have matured over the last decade, efforts have been made to 1.) reduce the cost, and 2.) produce general-purpose chips. While work done to generalize DMFBs typically depends on the flexibility of individually controlled electrodes, such devices have high wiring complexity, which requires costly multi-layer printed circuit boards (PCBs). In contrast, pin-constrained DMFBs reduce the wiring complexity, but reduce the flexibility of droplet coordination. We present a field-programmable pin-constrained DMFB that leverages the cost-savings of pin-constrained designs, but is general-purpose, rather than assay-specific. We show that with just a few more pins than the state-of-the-art pin-constrained designs, we can execute arbitrary assays almost as fast as the most recent general-purpose DMFB designs.","PeriodicalId":320207,"journal":{"name":"2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127308527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DuraCache: A durable SSD cache using MLC NAND flash","authors":"Ren-Shuo Liu, Chia-Lin Yang, Cheng-Hsuan Li, Geng-You Chen","doi":"10.1145/2463209.2488939","DOIUrl":"https://doi.org/10.1145/2463209.2488939","url":null,"abstract":"Adopting SSDs as caches for HDD arrays has gained popularity in datacenters because SSDs are superior in handling random reads that HDDs cannot efficiently deal with. Two types of flash memory cells are available for building SSD caches, single-level cells (SLC) and multi-level cells (MLC). MLC is more appealing than SLC because it can achieve higher cache capacity at the same cost. However, we see a critical issue for SSD caches to adopt MLC NAND flash: the endurance of modern MLC NAND flash is too low to sustain datacenter workloads. In this paper, we propose DuraCache that addresses the durability issue of SSD caches. DuraCache exploits the fact that SSD caches are write-through caches in datacenters. Therefore, uncorrectable errors in SSD caches can be handled like cache misses which bring in correct data from HDD arrays. In addition, DuraCache gradually allocates more ECC parities associated with data when NAND flash reaches wearout thresholds. This allows SSD caches to continue operating by sacrificing available capacity. We conduct empirical experiments and demonstrate that DuraCache enables MLC SSD caches to achieve 4.1 years of service life assuming a TPC-C workload.","PeriodicalId":320207,"journal":{"name":"2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114913574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On testing timing-speculative circuits","authors":"F. Yuan, Yannan Liu, W. Jone, Q. Xu","doi":"10.1145/2463209.2488771","DOIUrl":"https://doi.org/10.1145/2463209.2488771","url":null,"abstract":"By allowing the occurrence of infrequent timing errors and correcting them online, circuit-level timing speculation is one of the most promising variation-tolerant design techniques. How to effectively test timing-speculative circuits, however, has not been addressed in the literature. This is a challenging problem because conventional scan techniques cannot provide sufficient controllability and observability for such circuits. In this paper, we propose novel techniques to achieve high fault coverage for timing-speculative circuits without incurring high design-for-testability cost. Experimental results on various benchmark circuits demonstrate the effectiveness of the proposed solution.","PeriodicalId":320207,"journal":{"name":"2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128205338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hardware-efficient on-chip generation of time-extensive constrained-random sequences for in-system validation","authors":"A. Kinsman, Ho Fai Ko, N. Nicolici","doi":"10.1145/2463209.2488882","DOIUrl":"https://doi.org/10.1145/2463209.2488882","url":null,"abstract":"Linear Feedback Shift Registers (LFSRs) have been extensively used for compressed manufacturing test. They have been recently employed as a foundation for porting constrained-random stimuli from a pre-silicon verification environment to in-system validation. This work advances this concept by improving both the hardware efficiency and the duration of in-system validation experiments.","PeriodicalId":320207,"journal":{"name":"2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128330459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A scenario-based run-time task mapping algorithm for MPSoCs","authors":"W. Quan, A. Pimentel","doi":"10.1145/2463209.2488895","DOIUrl":"https://doi.org/10.1145/2463209.2488895","url":null,"abstract":"The application workloads in modern MPSoC-based embedded systems are becoming increasingly dynamic. Different applications concurrently execute and contend for resources in such systems which could cause serious changes in the intensity and nature of the workload demands over time. To cope with the dynamism of application workloads at run time and improve the efficiency of the underlying system architecture, this paper presents a novel scenario-based run-time task mapping algorithm. This algorithm combines a static mapping strategy based on workload scenarios and a dynamic mapping strategy to achieve an overall improvement of system efficiency. We evaluated our algorithm using a homogeneous MPSoC system with three real applications. From the results, we found that our algorithm achieves an 11.3% performance improvement and a 13.9% energy saving compared to running the applications without using any run-time mapping algorithm. When comparing our algorithm to three other, well-known run-time mapping algorithms, it is superior to these algorithms in terms of quality of the mappings found while also reducing the overheads compared to most of these algorithms.","PeriodicalId":320207,"journal":{"name":"2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133112777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}