{"title":"Advances in power management of many-core processors","authors":"Andrea Bartolini, D. Rossi","doi":"10.1049/pbpc022e_ch8","DOIUrl":"https://doi.org/10.1049/pbpc022e_ch8","url":null,"abstract":"","PeriodicalId":254920,"journal":{"name":"Many-Core Computing: Hardware and Software","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129805102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
L. Bauer, Hongyan Zhang, M. Kochte, E. Schneider, H. Wunderlich, J. Henkel
{"title":"Advances in hardware reliability of reconfigurable many-core embedded systems","authors":"L. Bauer, Hongyan Zhang, M. Kochte, E. Schneider, H. Wunderlich, J. Henkel","doi":"10.1049/pbpc022e_ch16","DOIUrl":"https://doi.org/10.1049/pbpc022e_ch16","url":null,"abstract":"The chapter discusses the background for the most demanding dependability challenges for reconfigurable processors in many-core systems and presents a dependable runtime reconfigurable processor for high reliability. It uses an adaptive modular redundancy technique that guarantees an application-specified level of reliability under changing SEU rates by budgeting the effective critical bits among all kernels and all accelerators of an application. This allows to deploy reconfigurable processors in harsh environments without statically protecting them.","PeriodicalId":254920,"journal":{"name":"Many-Core Computing: Hardware and Software","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123377566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Biologically-inspired massively-parallel computing","authors":"S. Furber","doi":"10.1049/pbpc022e_ch22","DOIUrl":"https://doi.org/10.1049/pbpc022e_ch22","url":null,"abstract":"Half a century of progress in computer technology has delivered machines of formidable capability and an expectation that similar advances will continue into the foreseeable future. However, much of the past progress has been driven by developments in semiconductor technology following Moore's Law, and there are strong grounds for believing that these cannot continue at the same rate. This, and related issues, suggest that there are huge challenges ahead in meeting the expectations of future progress, such as understanding how to exploit massive parallelism and how to deliver improvements in energy efficiency and reliability in the face of diminishing component reliability. Alongside these issues, recent advances in machine learning have created a demand for machines with cognitive capabilities, for example, to control autonomous vehicles, that we will struggle to deliver. Biological systems have, through evolution, found solutions to many of these problems, but we lack a fundamental understanding of how these solutions function. If we could advance our understanding of biological systems, we would open a rich source of ideas for unblocking progress in our engineered systems. An overview is given of SpiNNaker - a spiking neural network architecture. The SpiNNaker machine puts these principles together in the form of a massively parallel computer architecture designed both to model the biological brain, in order to accelerate our understanding of its principles of operation, and also to explore engineering applications of such machines.","PeriodicalId":254920,"journal":{"name":"Many-Core Computing: Hardware and Software","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117192633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Peng Yang, Zhehui Wang, Zhifei Wang, Xuanqi Chen, Luan H. K. Duong, Jiang Xu
{"title":"Silicon photonics enabled rack-scale many-core systems","authors":"Peng Yang, Zhehui Wang, Zhifei Wang, Xuanqi Chen, Luan H. K. Duong, Jiang Xu","doi":"10.1049/pbpc022e_ch18","DOIUrl":"https://doi.org/10.1049/pbpc022e_ch18","url":null,"abstract":"The increasingly higher demands on computing power from scientific computations, big data processing and deep learning are pushing the emergence of exascale computing systems. Tens of thousands of or even more manycore nodes are connected to build such systems. It imposes huge performance and power challenges on different aspects of the systems. As a basic block in high-performance computing systems, modularized rack will play a significant role in addressing these challenges. In this chapter, we introduce rack-scale optical networks (RSON), a silicon photonics enabled inter/intra-chip network for rack-scale many-core systems. RSON leverages the fact that most traffic is within rack and the high bandwidth and low-latency rack-scale optical network can improve both performance and energy efficiency. We codesign the intra-chip and inter-chip optical networks together with optical internode interface to provide balanced data access to both local memory and remote note's memory, making the nodes within rack cooperate effectively. The evaluations show that RSON can improve the overall performance and energy efficiency dramatically. Specifically, RSON can deliver as much as 5.4x more performance under the same energy consumption compared to traditional InfiniBand connected rack.","PeriodicalId":254920,"journal":{"name":"Many-Core Computing: Hardware and Software","volume":"146 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123740869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Singh, P. Dziurzański, G. Merrett, B. Al-Hashimi
{"title":"Tools and workloads for many-core computing","authors":"A. Singh, P. Dziurzański, G. Merrett, B. Al-Hashimi","doi":"10.1049/PBPC022E_CH5","DOIUrl":"https://doi.org/10.1049/PBPC022E_CH5","url":null,"abstract":"Proper tools and workloads are required to evaluate any computing systems. This enables designers to fulfill the desired properties expected by the end-users. It can be observed that multi/many-core chips are omnipresent from small-to-large-scale systems, such as mobile phones and data centers. The reliance on multi/many-core chips is increasing as they provide high-processing capability to meet the increasing performance requirements of complex applications in various application domains. The high-processing capability is achieved by employing parallel processing on the cores where the application needs to be partitioned into a number of tasks or threads and they need to be efficiently allocated onto different cores. The applications considered for evaluations represent workloads and toolchains required to facilitate the whole evaluation are referred to as tools. The tools facilitate realization of different actions (e.g., thread-to-core mapping and voltage/frequency control, which are governed by OS scheduler and power governor, respectively) and their effect on different performance monitoring counters leading to a change in the performance metrics (e.g., energy consumption and execution time) concerned by the end-users.","PeriodicalId":254920,"journal":{"name":"Many-Core Computing: Hardware and Software","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124874592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Shafique, O. Hasan, R. Hafiz, Sana Mazahir, Muhammad Abdullah Hanif, Semeen Rehman
{"title":"Approximate computing across the hardware and software stacks","authors":"M. Shafique, O. Hasan, R. Hafiz, Sana Mazahir, Muhammad Abdullah Hanif, Semeen Rehman","doi":"10.1049/pbpc022e_ch20","DOIUrl":"https://doi.org/10.1049/pbpc022e_ch20","url":null,"abstract":"Emerging fields like big data and IoT have brought a number of challenges for hardware as well as software design community. Some of the major challenges are to scale the computational and memory resources and the efficiency of the processing devices as per the growing needs. In the past few years, a number of fields have emerged for addressing these challenges. We focus on one of the prominent paradigms that have the potential to improve the resource efficiency regardless of the underlying technology, i.e., approximate computing (AC). AC aims at relaxing the bounds of exact computing to provide new opportunities for achieving gains in terms of energy, power, performance, and/or area efficiency at the cost of reduced output quality, typically within the tolerable range. We first provide an overview of AC and the techniques which are commonly being employed at different abstraction levels for alleviating the resource requirements of computationally intensive applications. Afterwards, a detailed discussion on component-level approximations and their probabilistic behavior by considering approximate adders and multipliers is presented. At the next step, a methodology used to construct efficient accelerators from these components will be discussed. The discussion will then be extended to approximate memories and runtime management systems. Toward the end of the chapter, we present a methodology for designing energy efficient many-core systems based upon approximate components followed by the challenges in adopting a cross-layer approach for designing highly energy, power, and performance-efficient systems.","PeriodicalId":254920,"journal":{"name":"Many-Core Computing: Hardware and Software","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128312222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Many-core systems for big-data computing","authors":"S. Ainsworth, Timothy M. Jones","doi":"10.1049/pbpc022e_ch21","DOIUrl":"https://doi.org/10.1049/pbpc022e_ch21","url":null,"abstract":"In many ways, big data should be the poster-child of many-core computing. By necessity, such applications typically scale extremely well across machines, featuring high levels of thread-level parallelism. Programming techniques, such as Google's MapReduce, have allowed many applications running in the data centre to be programmed with parallelism directly in mind and have enabled extremely high throughput across machines. We explore the state-of-the-art in terms of techniques used to make many-core architectures work for big-data workloads. We explore how tail-latency concerns mean that even though workloads are parallel, high performance is still necessary in at least some parts of the system. We take a look at how memory-system issues can cause some big-data applications to scale less favourably than we would like for many-core architectures. We examine the programming models used for big-data workloads and consider how these both help and hinder the typically complex mapping seen elsewhere for many-core architectures. And we also take a look at the alternatives to traditional many-core systems in exploiting parallelism for efficiency in the big-data space.","PeriodicalId":254920,"journal":{"name":"Many-Core Computing: Hardware and Software","volume":"02 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121456350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptive-reflective middleware for power and energy management in many-core heterogeneous systems","authors":"T. Muck, A. Rahmani, N. Dutt","doi":"10.1049/pbpc022e_ch7","DOIUrl":"https://doi.org/10.1049/pbpc022e_ch7","url":null,"abstract":"","PeriodicalId":254920,"journal":{"name":"Many-Core Computing: Hardware and Software","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131203113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Arian Maghazeh, P. Eles, Zebo Peng, A. Andrei, Unmesh D. Bordoloi, Usman Dastgeer
{"title":"Adaptive packet processing on CPU-GPU heterogeneous platforms","authors":"Arian Maghazeh, P. Eles, Zebo Peng, A. Andrei, Unmesh D. Bordoloi, Usman Dastgeer","doi":"10.1049/pbpc022e_ch10","DOIUrl":"https://doi.org/10.1049/pbpc022e_ch10","url":null,"abstract":"","PeriodicalId":254920,"journal":{"name":"Many-Core Computing: Hardware and Software","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115706424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}