{"title":"Hardware neural network accelerators","authors":"O. Temam","doi":"10.1109/CODES-ISSS.2013.6659008","DOIUrl":"https://doi.org/10.1109/CODES-ISSS.2013.6659008","url":null,"abstract":"Because of increasingly stringent energy constraints (e.g., Dark Silicon, there is a growing consensus in the community that we may be moving towards heterogeneous multi-core architectures, composed of a mix of cores and accelerators. Because our community is traditionally focused on general-purpose computing, we have been especially considering accelerator approaches such as GPUs and reconfigurable circuits. An attractive alternative is to investigate accelerators which are focused on a few key algorithms: key algorithms still mean broad application scope, but few algorithms enable energy efficient and cost-effective accelerators.","PeriodicalId":163484,"journal":{"name":"2013 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"157 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126722035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kristofor D. Carlson, J. Nageswaran, N. Dutt, J. Krichmar
{"title":"Design space exploration and parameter tuning for neuromorphic applications","authors":"Kristofor D. Carlson, J. Nageswaran, N. Dutt, J. Krichmar","doi":"10.5555/2555692.2555712","DOIUrl":"https://doi.org/10.5555/2555692.2555712","url":null,"abstract":"Large-scale spiking neural networks (SNNs) have been used to successfully model complex neural circuits that explore various neural phenomena such as learning and memory, vision systems, auditory systems, neural oscillations, and many other important topics of neural function. Additionally, SNNs are particularly well-adapted to run on neuromorphic hardware as spiking events are often sparse, leading to a potentially large reduction in both bandwidth requirements and power usage. The inclusion of realistic plasticity equations, neural dynamics, and recurrent topologies has increased the descriptive power of SNNs but has also made the task of tuning these biologically realistic SNNs difficult. We present an automated parameter-tuning framework capable of tuning large-scale SNNs quickly and efficiently using evolutionary algorithms (EA) and off-the-shelf graphics processing units (GPUs).","PeriodicalId":163484,"journal":{"name":"2013 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114061587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DHeating: Dispersed heating repair for self-healing NAND flash memory","authors":"Renhai Chen, Yi Wang, Z. Shao","doi":"10.1109/CODES-ISSS.2013.6658994","DOIUrl":"https://doi.org/10.1109/CODES-ISSS.2013.6658994","url":null,"abstract":"Short lifetimes are becoming a critical issue in NAND flash memory with the advent of multi-level cell and triple-level cell flash memory. Researchers at Macronix have recently discovered that heating can cause worn-out NAND flash cells to become reusable and greatly prolong the lifetime of flash memory cells. However, the heating process consumes a substantial amount of power. This means that some fundamental changes are required if existing NAND flash management techniques are to be applied in self-healing NAND flash memory. In particular, all existing wear-leveling techniques are based on the principle of evenly distributing writes and erases. This causes NAND flash cells tend to wear out in a short time period. Moreover, healing these cells in a concentrated manner may cause power outages in mobile devices. In this paper, we propose for the first time a new wear-leveling scheme called DHeating (Dispersed Heating) to solve the concentrated heating problem in self-healing flash memory. In DHeating, rather than evenly distributing writes and erases over a time period, write and erase operations are concentrated on a small portion of flash memory cells, so that these cells can be worn-out and healed by heating first. In this way, we can disperse healing to avoid the problem of concentrated power usage caused by heating. Furthermore, with the very long lifetime that results from self-healing, we can sacrifice lifetime for reliability. Therefore, we propose an early heating strategy to solve the reliability problem caused by concentrated heating. The idea is to start the healing process earlier by heating NAND flash cells before their expected endurance. We evaluate our scheme based on a real embedded platform. The experimental results show that our scheme can effectively solve the concentrated heating problem.","PeriodicalId":163484,"journal":{"name":"2013 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133977967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Neukirchner, Kai Lampka, Sophie Quinton, R. Ernst
{"title":"Multi-mode monitoring for mixed-criticality real-time systems","authors":"M. Neukirchner, Kai Lampka, Sophie Quinton, R. Ernst","doi":"10.1109/CODES-ISSS.2013.6659021","DOIUrl":"https://doi.org/10.1109/CODES-ISSS.2013.6659021","url":null,"abstract":"We present a scheme for monitoring activation patterns of multiple tasks in mixed-criticality real-time systems. Unlike previous approaches, which enforce a single pre-defined activation pattern bound per task, we propose a multi-mode approach, where monitors can dynamically switch between different configurations, depending on the observed activation pattern at other tasks. The required configurations are based on real-time interfaces which we determine through sensitivity analysis. In an evaluation we show, that switching between monitor configurations allows to dynamically reassign timing slack between tasks and thereby achieve better resource utilization and still provide the same timing guarantees.","PeriodicalId":163484,"journal":{"name":"2013 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133040991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Synthesis-friendly techniques for tightly-coupled integration of hardware accelerators into shared-memory multi-core clusters","authors":"Francesco Conti, A. Marongiu, L. Benini","doi":"10.1109/CODES-ISSS.2013.6658992","DOIUrl":"https://doi.org/10.1109/CODES-ISSS.2013.6658992","url":null,"abstract":"Several many-core designs tackle scalability issues by leveraging tightly-coupled clusters as building blocks, where low-latency, high-bandwidth interconnection between a small/medium number of cores and L1 memory achieves high performance/watt. Tight coupling of hardware accelerators into these multicore clusters constitutes a promising approach to further improve performance/area/watt. However, accelerators are often clocked at a lower frequency than processor clusters for energy efficiency reasons. In this paper, we propose a technique to integrate shared-memory accelerators within the tightly-coupled clusters of the STMicroelectronics STHORM architecture. Our methodology significantly relaxes timing constraints for tightly-coupled accelerators, while optimizing data bandwidth. In addition, our technique allows to operate the accelerator at an integer submultiple of the cluster frequency. Experimental results show that the proposed approach allows to recover up to 84% of the slow-down implied by reduced accelerator speed.","PeriodicalId":163484,"journal":{"name":"2013 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124174120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Beiye Liu, Miao Hu, Hai Helen Li, Yiran Chen, C. Xue
{"title":"Bio-inspired ultra lower-power neuromorphic computing engine for embedded systems","authors":"Beiye Liu, Miao Hu, Hai Helen Li, Yiran Chen, C. Xue","doi":"10.1109/CODES-ISSS.2013.6659010","DOIUrl":"https://doi.org/10.1109/CODES-ISSS.2013.6659010","url":null,"abstract":"Neuromorphic computing, which is inspired by the working mechanism of human brain, recently emerges as a hot research area to combat the contradiction between the limited functions of computing systems and the ever increasing variety of applications. In this work, we will introduce our research on a bio-inspired neuromorphic embedded computing engine named Centaur, which aims to achieve an ultra-high power efficiency beyond One-TeraFlops-Per-Watt by adopting the bio-inspired computation model and the advanced memristor technology. The success of Centaur design may promote the embedded system power efficiency three orders of magnitude from the current level while the small footprint and real-time re-configurability of the design allow an easy integration into MPSoCs, enabling many emerging mobile and embedded applications.","PeriodicalId":163484,"journal":{"name":"2013 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"145 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115132529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CMSM: An efficient and effective Code Management for Software Managed Multicores","authors":"Ke Bai, Jing Lu, Aviral Shrivastava, Bryce Holton","doi":"10.1109/CODES-ISSS.2013.6658998","DOIUrl":"https://doi.org/10.1109/CODES-ISSS.2013.6658998","url":null,"abstract":"As we scale the number of cores in a multicore processor, scaling the memory hierarchy is a major challenge. Software Managed Multicore (SMM) architectures are one of the promising solutions. In an SMM architecture, there are no caches, and each core has only a local scratchpad memory. If all the code and data of the task mapped to a core do not fit on its local scratchpad memory, then explicit code and data management is required. In this paper, we solve the problem of efficiently managing code on an SMM architecture. We extend the state of the art by: i) correctly calculating the code management overhead, ii) even in the presence of branches in the task, and iii) developing a heuristic CMSM (Code Mapping for Software Managed multicores) that results in efficient code management execution on the local scratchpad memory. Our experimental results collected after executing applications from MiBench suite [1] on the Cell SPEs (Cell is an SMM architecture) [2], demonstrate that correct management cost calculation and branch consideration can improve performance by 12%. Our heuristic CMSM can reduce runtime in more than 80% of the cases, and by up to 20% on our set of benchmarks.","PeriodicalId":163484,"journal":{"name":"2013 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"308 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129092544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lide Zhang, David R. Bild, R. Dick, Z. Morley Mao, P. Dinda
{"title":"Panappticon: Event-based tracing to measure mobile application and platform performance","authors":"Lide Zhang, David R. Bild, R. Dick, Z. Morley Mao, P. Dinda","doi":"10.1109/CODES-ISSS.2013.6659020","DOIUrl":"https://doi.org/10.1109/CODES-ISSS.2013.6659020","url":null,"abstract":"Improving and optimizing user-perceived smartphone performance requires understanding device, system, and application behavior for real-world workloads. However, measuring such performance is challenging due to the multi-threaded, asynchronous programming paradigms used in modern applications and the multiple layers of hardware and software used to respond to user input events. We address this challenge with Panappticon, a lightweight, system-wide, fine-grained event tracing system for Android that automatically identifies critical execution paths in user transactions. Panappticon monitors the application, system, and kernel software layers and can identify performance problems stemming from application design flaws, underpowered hardware, and harmful interactions between apparently unrelated applications. We carried out a 14-user, one-month study of an Android smartphone system instrumented with Panappticon, which revealed a number of specific problems and areas for improvement that may be of interest to system designers, application developers, and device manufactures.","PeriodicalId":163484,"journal":{"name":"2013 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122578641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abbas Rahimi, A. Marongiu, Rajesh K. Gupta, L. Benini
{"title":"A variability-aware OpenMP environment for efficient execution of accuracy-configurable computation on shared-FPU processor clusters","authors":"Abbas Rahimi, A. Marongiu, Rajesh K. Gupta, L. Benini","doi":"10.1109/CODES-ISSS.2013.6659022","DOIUrl":"https://doi.org/10.1109/CODES-ISSS.2013.6659022","url":null,"abstract":"We propose a tightly-coupled, multi-core cluster architecture with shared, variation-tolerant, and accuracy-reconfigurable floating-point units (FPUs). The resilient shared-FPUs dynamically characterize FP pipeline vulnerability (FPV) and expose it as metadata to a software scheduler for reducing the cost of error correction. To further reduce this cost, our programming and runtime environment also supports controlled approximate computation through a combination of design-time and runtime techniques. We provide OpenMP extensions (as custom directives) for FP computations to specify parts of a program that can be executed approximately. We use a profiling technique to identify tolerable error significance and error rate thresholds in error-tolerant image processing applications. This information guides an application-driven hardware FPU synthesis and optimization design flow to generate efficient FPUs. At runtime, the scheduler utilizes FPV metadata and promotes FPUs to accurate mode, or demotes them to approximate mode depending upon the code region requirements. We demonstrate the effectiveness of our approach (in terms of energy savings) on a 16-core tightly-coupled cluster with eight shared-FPUs for both error-tolerant and general-purpose error-intolerant applications.","PeriodicalId":163484,"journal":{"name":"2013 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115954842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Diguet, M. Strum, Nicolas Le Griguer, Lydie Caetano, Martha Johanna Sepúlveda
{"title":"Scalable NoC-based architecture of neural coding for new efficient associative memories","authors":"J. Diguet, M. Strum, Nicolas Le Griguer, Lydie Caetano, Martha Johanna Sepúlveda","doi":"10.1109/CODES-ISSS.2013.6659006","DOIUrl":"https://doi.org/10.1109/CODES-ISSS.2013.6659006","url":null,"abstract":"We present the first NoC-based hardware implementation of Neural Coding (NC), which is a new approach that opens outstanding perspectives for the design of associative memories and learning machines. We first propose optimized architectures of memories and processing elements that allow for an efficient distributed implementation. Then we introduce different NoC architectures to interconnect all elements, it provides the required scalability and takes advantage of parallel transfer opportunities. Performance, cost and energy consumption tradeoffs of various NoC solutions are compared and discussed. Based on previous implementation results, we run SystemC-TLM that validate the behavior of the algorithm and of the efficiency of the dedicated architecture. This work demonstrates that this architecture can meet expected requirements in terms of scalability and hierarchy, and consequently that NC-based architectures are compliant with efficient hardware implementations of a new and promising model of associative memories.","PeriodicalId":163484,"journal":{"name":"2013 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127839957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}