Karina Gitina, Sven Reimer, M. Sauer, Ralf Wimmer, Christoph Scholl, B. Becker
{"title":"Equivalence checking of partial designs using dependency quantified Boolean formulae","authors":"Karina Gitina, Sven Reimer, M. Sauer, Ralf Wimmer, Christoph Scholl, B. Becker","doi":"10.1109/ICCD.2013.6657071","DOIUrl":"https://doi.org/10.1109/ICCD.2013.6657071","url":null,"abstract":"We consider the partial equivalence checking problem (PEC), i. e., checking whether a given partial implementation of a combinational circuit can (still) be extended to a complete design that is equivalent to a given full specification. To solve PEC, we give a linear transformation from PEC to the question whether a dependency quantified Boolean formula (DQBF) is satisfied. Our novel algorithm to solve DQBF based on quantifier elimination can therefore be applied to solve PEC.We also present first experimental results showing the feasibility of our approach and the inaccuracy of QBF approximations, which are usually used for deciding the PEC so far.","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122548560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alen Bardizbanyan, Magnus Själander, D. Whalley, P. Larsson-Edefors
{"title":"Speculative tag access for reduced energy dissipation in set-associative L1 data caches","authors":"Alen Bardizbanyan, Magnus Själander, D. Whalley, P. Larsson-Edefors","doi":"10.1109/ICCD.2013.6657057","DOIUrl":"https://doi.org/10.1109/ICCD.2013.6657057","url":null,"abstract":"Due to performance reasons, all ways in set-associative level-one (L1) data caches are accessed in parallel for load operations even though the requested data can only reside in one of the ways. Thus, a significant amount of energy is wasted when loads are performed. We propose a speculation technique that performs the tag comparison in parallel with the address calculation, leading to the access of only one way during the following cycle on successful speculations. The technique incurs no execution time penalty, has an insignificant area overhead, and does not require any customized SRAM implementation. Assuming a 16kB 4-way set-associative L1 data cache implemented in a 65-nm process technology, our evaluation based on 20 different MiBench benchmarks shows that the proposed technique on average leads to a 24% data cache energy reduction.","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123552029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Resonant frequency divider design methodology for dynamic frequency scaling","authors":"Y. Teng, B. Taskin","doi":"10.1109/ICCD.2013.6657087","DOIUrl":"https://doi.org/10.1109/ICCD.2013.6657087","url":null,"abstract":"A rotary traveling wave oscillator (RTWO) frequency divider design methodology is proposed for dynamic frequency scaling. The proposed methodology can be used for designing dividers for integer division ratios of 3 to 9 within one circuit topology. HSPICE-based experiments are performed to test the electrical characteristics of the RTWO frequency dividers. The simulation results show that the power consumption of a frequency divider is as low as approximately 5mW for different frequency division ratios.","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130200953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Keun Sup Shim, Mieszko Lis, Myong Hyon Cho, Ilia A. Lebedev, S. Devadas
{"title":"Design tradeoffs for simplicity and efficient verification in the Execution Migration Machine","authors":"Keun Sup Shim, Mieszko Lis, Myong Hyon Cho, Ilia A. Lebedev, S. Devadas","doi":"10.1109/ICCD.2013.6657037","DOIUrl":"https://doi.org/10.1109/ICCD.2013.6657037","url":null,"abstract":"As transistor technology continues to scale, the architecture community has experienced exponential growth in design complexity and significantly increasing implementation and verification costs. Moreover, Moore's law has led to a ubiquitous trend of an increasing number of cores on a single chip. Often, these large-core-count chips provide a shared memory abstraction via directories and coherence protocols, which have become notoriously error-prone and difficult to verify because of subtle data races and state space explosion. Although a very simple hardware shared memory implementation can be achieved by simply not allowing ad-hoc data replication and relying on remote accesses for remotely cached data (i.e., requiring no directories or coherence protocols), such remote-access-based directoryless architectures cannot take advantage of any data locality, and therefore suffer in both performance and energy. Our recently taped-out 110-core shared-memory processor, the Execution Migration Machine (EM2), establishes a new design point. On the one hand, EM2 supports shared memory but does not automatically replicate data, and thus preserves the simplicity of directoryless architectures. On the other hand, it significantly improves performance and energy over remote-access-only designs by exploiting data locality at remote cores via fast hardware-level thread migration. In this paper, we describe the design choices made in the EM2 chip as well as our choice of design methodology, and discuss how they combine to achieve design simplicity and verification efficiency. Even though EM2 is a fairly large design-110 cores using a total of 357 million transistors-the entire chip design and implementation process (RTL, verification, physical design, tapeout) took only 18 man-months.","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125355969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Sneak path testing and fault modeling for multilevel memristor-based memories","authors":"Sachhidh Kannan, R. Karri, O. Sinanoglu","doi":"10.1109/ICCD.2013.6657045","DOIUrl":"https://doi.org/10.1109/ICCD.2013.6657045","url":null,"abstract":"Memristors are an attractive option for use in future memory architectures due to their non-volatility, low power operation, compactness and ability to store multiple bits in a single cell. Notwithstanding these advantages, memristors and memristor-based memories are prone to high defect densities due to the non-deterministic nature of nanoscale fabrication. As a first step, we will examine the defect mechanisms in multi-level cells (MLC) using memristors and develop efficient fault models. We will also investigate efficient test techniques for multi-level memristor based memories. The typical approach to testing a memory subsystem entails testing one memory cell at a time. This is time consuming and does not scale for dense, memristor-based memories. We propose an efficient testing technique to test memristor-based memories. The proposed scheme uses sneak paths inherent in crossbar memories to test multiple memristors at the same time and thereby reduces the test time by 27%.","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"474 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126316889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On design vulnerability analysis and trust benchmarks development","authors":"H. Salmani, M. Tehranipoor, R. Karri","doi":"10.1109/ICCD.2013.6657085","DOIUrl":"https://doi.org/10.1109/ICCD.2013.6657085","url":null,"abstract":"The areas of hardware security and trust have experienced major growth over the past several years. However, research in Trojan detection and prevention lacks standard benchmarks and measurements, resulting in inconsistent research outcomes, and ambiguity in analyzing strengths and weaknesses in the techniques developed by different research teams and their advancements to the state-of-the-art. We have developed innovative methodologies that, for the first time, more effectively address the problem. We have developed a vulnerability analysis flow. The flow determines hard-to-detect areas in a circuit that would most probably be used for Trojan implementation to ensure a Trojan goes undetected during production test and extensive functional test analysis. Furthermore, we introduce the Trojan detectability metric to quantify Trojan activation and effect. This metric offers a fair comparison for analyzing weaknesses and strengths of Trojan detection techniques. Using these methodologies, we have developed a large number of trust benchmarks that are available for use by the public, as well as researchers and practitioners in the field.","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115076531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jason M. Allred, Sanghamitra Roy, Koushik Chakraborty
{"title":"Long term sustainability of differentially reliable systems in the dark silicon era","authors":"Jason M. Allred, Sanghamitra Roy, Koushik Chakraborty","doi":"10.1109/ICCD.2013.6657027","DOIUrl":"https://doi.org/10.1109/ICCD.2013.6657027","url":null,"abstract":"As transistor miniaturization continues, providing robustness and computational correctness comes with rising power, performance, and area overhead costs. However, the diversity of software error tolerance is increasing as modern society embraces ubiquitous computing. This diversity can be exploited by differentially reliable (DR) multicore systems. The rising level of dark silicon-the portion of a chip that must remain inactive due to power budget constraints-makes such DR systems even more attractive when compared to homogeneous designs because power efficiency is improved with the increased flexibility of dynamically selecting appropriate cores for a given software workload. However, ensuring the long-term sustainability of these DR systems is a profound challenge. Asymmetric utilization of cores, differential aging degradation, and manufacturing process variation alter the relative reliability of DR system components, degrading and even eliminating the energy efficiency advantage. In this paper, we propose a feedback control based thread-to-core mapping framework to ensure longterm sustainability and extend the energy efficiency of a DR system. Over a ten-year lifespan, we analyze our approach on two DR design techniques and respectively demonstrate 14.4-16.3% and 26.1-31.0% in sustained energy-efficiency benefits, surpassing the recently proposed race-to-idle approach.","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115060536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Madhushika M. E. Karunarathna, Yu-Chu Tian, C. Fidge, R. Hayward
{"title":"Algorithm clustering for multi-algorithm processor design","authors":"Madhushika M. E. Karunarathna, Yu-Chu Tian, C. Fidge, R. Hayward","doi":"10.1109/ICCD.2013.6657080","DOIUrl":"https://doi.org/10.1109/ICCD.2013.6657080","url":null,"abstract":"An Application Specific Instruction-set Processor (ASIP) is a specialized processor tailored to run a particular application/s efficiently. However, when there are multiple candidate applications in the application's domain it is difficult and time consuming to find optimum set of applications to be implemented. Existing ASIP design approaches perform this selection manually based on a designer's knowledge. We help in cutting down the number of candidate applications by devising a classification method to cluster similar applications based on the special-purpose operations they share. This provides a significant reduction in the comparison overhead while resulting in customized ASIP instruction sets which can benefit a whole family of related applications. Our method gives users the ability to quantify the degree of similarity between the sets of shared operations to control the size of clusters. A case study involving twelve algorithms confirms that our approach can successfully cluster similar algorithms together based on the similarity of their component operations.","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"148 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121764990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dynamic AC-scheduling for hardware cores with unknown and uncertain information","authors":"S. Lovergine, Fabrizio Ferrandi","doi":"10.1109/ICCD.2013.6657086","DOIUrl":"https://doi.org/10.1109/ICCD.2013.6657086","url":null,"abstract":"Modern hardware cores necessarily have to deal with many sources of unknown or uncertain information. Components with variable latency and unpredictable behavior are becoming predominant in hardware designs. Conventional hardware cores underperform when dealing with unknown or uncertain information. Common High-Level Synthesis (HLS) approaches, which require to specify the complete behavior at design-time, present significant restrictions in supporting this kind of conditions. The literature proposes several dynamic scheduling techniques to improve the cores performance by handling inherent uncertainty of applications. However, they do not address other sources of unknown information. In this paper, we propose the dynamic Activating Conditions (AC)-scheduling: a methodology for the design automation of hardware cores which can dynamically adapt the instructions scheduling according to behaviors unknown at design-time. Neither assumptions about components latency nor worst case approach are required. Experimental results show significant performance increase, with limited area overhead, with respect to state-of-the-art approaches.","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"134 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133824432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Data placement in HPC architectures with heterogeneous off-chip memory","authors":"Milan Pavlović, Nikola Puzovic, Alex Ramírez","doi":"10.1109/ICCD.2013.6657042","DOIUrl":"https://doi.org/10.1109/ICCD.2013.6657042","url":null,"abstract":"The performance of HPC applications is often bounded by the underlying memory system's performance. The trend of increasing the number of cores on a chip imposes even higher memory bandwidth and capacity requirements. The limitations of traditional memory technologies are pushing research in the direction of hybrid memory systems that, besides DRAM, include one or more modules based on some of the higher-density non-volatile memory technologies, where one of them will provide the required bandwidth, while the other will provide the required capacity for the application. This creates many challenges with data placement and migration policies between the modules of such hybrid memory system. In this paper, we propose an architecture with a hybrid memory design that places two technologically different memory modules in a flat address space. On such system, we evaluate several HPC workloads against different data placement and migration policies, compare their performance by means of execution time and the number of non-volatile memory writes, and consider how it can be applied to the future HPC architectures. Our results show that the hybrid memory system with dynamic page migration and limited DRAM capacity, can achieve performance that is comparable to a hypothetical, hard to implement, DRAM-only system.","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130059655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}