{"title":"Operating systems for many-core systems","authors":"Hendrik Borghorst, O. Spinczyk","doi":"10.1049/pbpc022e_ch3","DOIUrl":"https://doi.org/10.1049/pbpc022e_ch3","url":null,"abstract":"The ongoing trend toward many-core computer systems and adequate new programming models has spawned numerous new activities in the domain of operating system (OS) research during recent years. This chapter will address the challenges and opportunities for OS developers in this new field and give an overview of state-of-the-art research.This section will introduce the reader to the spectrum of contemporary many-core CPU architectures, application programming models for many-core systems, give a brief overview of the resulting challenges for OS developers.","PeriodicalId":254920,"journal":{"name":"Many-Core Computing: Hardware and Software","volume":"183 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121694724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nandita Vijaykumar, Kevin Hsieh, Gennady Pekhimenko, S. Khan, Ashish Shrestha, Saugata Ghose, Adwait Jog, Phillip B. Gibbons, O. Mutlu
{"title":"Decoupling the programming model from resource management in throughput processors","authors":"Nandita Vijaykumar, Kevin Hsieh, Gennady Pekhimenko, S. Khan, Ashish Shrestha, Saugata Ghose, Adwait Jog, Phillip B. Gibbons, O. Mutlu","doi":"10.1049/pbpc022e_ch4","DOIUrl":"https://doi.org/10.1049/pbpc022e_ch4","url":null,"abstract":"This chapter introduces a new resource virtualization framework, Zorua, that decouples the graphics processing unit (GPU) programming model from the management of key on-chip resources in hardware to enhance programming ease, portability, and performance. The application resource specification-a static specification of several parameters such as the number of threads and the scratchpad memory usage per thread block-forms a critical component of the existing GPU programming models. This specification determines the parallelism, and, hence, performance of the application during execution because the corresponding on-chip hardware resources are allocated and managed purely based on this specification. This tight coupling between the software-provided resource specification and resource management in hardware leads to significant challenges in programming ease, portability, and performance, as we demonstrate in this chapter using real data obtained on state-of-the-art GPU systems. Our goal in this work is to reduce the dependence of performance on the software-provided static resource specification to simultaneously alleviate the above challenges. To this end, we introduce Zorua, a new resource virtualization framework, that decouples the programmer-specified resource usage of a GPU application from the actual allocation in the on-chip hardware resources. Zorua enables this decoupling by virtualizing each resource transparently to the programmer. The virtualization provided by Zorua builds on two key concepts-dynamic allocation of the on-chip resources and their oversubscription using a swap space in memory. Zorua provides a holistic GPU resource virtualization strategy designed to (i) adaptively control the extent of oversubscription and (ii) coordinate the dynamic management of multiple on-chip resources to maximize the effectiveness of virtualization.We demonstrate that by providing the illusion of more resources than physically available via controlled and coordinated virtualization, Zorua offers several important benefits: (i) Programming ease. It eases the burden on the programmer to provide code that is tuned to efficiently utilize the physically available on-chip resources. (ii) Portability. It alleviates the necessity of retuning an application's resource usage when porting the application across GPU generations. (iii) Performance. By dynamically allocating resources and carefully oversubscribing them when necessary, Zorua improves or retains the performance of applications that are already highly tuned to best utilize the resources. The holistic virtualization provided by Zorua has many other potential uses, e.g., fine-grained resource sharing among multiple kernels, low latency preemption of GPU programs, and support for dynamic parallelism, which we describe in this chapter.","PeriodicalId":254920,"journal":{"name":"Many-Core Computing: Hardware and Software","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127764605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"From power-efficient to power-driven computing","authors":"R. Shafik, A. Yakovlev","doi":"10.1049/pbpc022e_ch11","DOIUrl":"https://doi.org/10.1049/pbpc022e_ch11","url":null,"abstract":"The dramatic spread of computing, at the scale of trillions of ubiquitous devices, is delivering on the pervasive penetration into the real world in the form of Internet of Things (IoT). Today, the widely used power-efficient paradigms directly related to the behaviour of computing systems are those of real-time (working to deadlines imposed from the real world) and low-power (prolonging battery life or reducing heat dissipation and electricity bills). None of these addresses the strict requirements on power supply, allocation and utilisation that are imposed by the needs of new devices and applications in the computing swarm - many of which are expected to be confronted with challenges of autonomy and battery-free long life. Indeed, we need to design and build systems for survival, operating under a wide range of power constraints; we need a new power-driven paradigm called real-power computing (RPC). The article provides an overview of this emerging paradigm with definition, taxonomies and a case study, together with a summary of the existing research. Towards the end, the overview leads to research and development challenges and opportunities surfacing this paradigm. Throughout the article, we have used the power and energy terms as follows. From the supply side, the energy term will be used to refer to harvesters with built-in storage, while the power term will indicate instantaneous energy dispensation. For the computing logic side, the energy term will define the total power consumed over a given time interval.","PeriodicalId":254920,"journal":{"name":"Many-Core Computing: Hardware and Software","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117006621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"HPC with many core processors","authors":"X. Martorell, Jorge Bellón, Víctor López, Vicencc Beltran, Sergi Mateo, Xavier Teruel, E. Ayguadé, Jesús Labarta","doi":"10.1049/pbpc022e_ch1","DOIUrl":"https://doi.org/10.1049/pbpc022e_ch1","url":null,"abstract":"The current trends in building clusters and supercomputers are to use medium-to-big symmetric multi-processors (SMP) nodes connected through a high-speed network. Applications need to accommodate to these execution environments using distributed and shared memory programming, and thus become hybrid. Hybrid applications are written with two or more programming models, usually message passing interface (MPI) [1,2] for the distributed environment and OpenMP [3,4] for the shared memory support. The goal of this chapter is to show how the two programming models can be made interoperable and ease the work of the programmer. Thus, instead of asking the programmers to code optimizations targeting performance, it is possible to rely on the good interoperability between the programming models to achieve high performance. For example, instead of using non-blocking message passing and double buffering to achieve computation-communication overlap, our approach provides this feature by taskifying communications using OpenMP tasks [5,6].","PeriodicalId":254920,"journal":{"name":"Many-Core Computing: Hardware and Software","volume":"150 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123227063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hao Yu, Sai Manoj Pudukotai Dinakarrao, Hantao Huang
{"title":"Cognitive I/O for 3D-integrated many-core system","authors":"Hao Yu, Sai Manoj Pudukotai Dinakarrao, Hantao Huang","doi":"10.1049/pbpc022e_ch19","DOIUrl":"https://doi.org/10.1049/pbpc022e_ch19","url":null,"abstract":"Increasing demands to process large amounts of data in real time leads to an increase in the many-core microprocessors, which is posing a grand challenge for an effective and management of available resources. As communication power occupies a significant portion of power consumption when processing such big data, there is an emerging need to devise a methodology to reduce the communication power without sacrificing the performance. To address this issue, we introduce a cognitive I/O designed toward 3D-integrated many-core microprocessors that performs adaptive tuning of the voltage-swing levels depending on the achieved performance and power consumption. We embed this cognitive I/O in a many-core microprocessor with DRAM memory partitioning to perform energy saving for application such as fingerprint matching and face recognition.","PeriodicalId":254920,"journal":{"name":"Many-Core Computing: Hardware and Software","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128259754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Skitsas, Marco Restifo, M. Michael, Nicopoulos Chrysostomos, P. Bernardi, Sanchez Ernesto
{"title":"Self-testing of multicore processors","authors":"M. Skitsas, Marco Restifo, M. Michael, Nicopoulos Chrysostomos, P. Bernardi, Sanchez Ernesto","doi":"10.1049/PBPC022E_CH15","DOIUrl":"https://doi.org/10.1049/PBPC022E_CH15","url":null,"abstract":"The purpose of this chapter is to develop a review of state-of-the-art techniques and methodologies for the self-testing of multicore processors. The chapter is divided into two main sections: (a) self-testing solutions covering general-purpose multicore microprocessors such as chip multiprocessors (CMPs) and (b) self-testing solutions targeting application-specific multicore designs known as SoCs. In the first section (general-purpose), a taxonomy of current self-testing approaches is initially presented, followed by a review of the state-of-the-art for each class. The second section (application-specific) provides an overview of the test scheduling flows for multicore SoCs, as well as the testing strategies for the individual components (sub-systems) of such systems.","PeriodicalId":254920,"journal":{"name":"Many-Core Computing: Hardware and Software","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129234764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"From irregular heterogeneous software to reconfigurable hardware","authors":"John Wickerson, G. Constantinides","doi":"10.1049/pbpc022e_ch2","DOIUrl":"https://doi.org/10.1049/pbpc022e_ch2","url":null,"abstract":"A heterogeneous system is the one that incorporates more than one kind of computing device. Such a system can offer better performance per Watt than a homogeneous one if the applications it runs are programmed to take advantage of the different strengths of the different devices in the system. A typical heterogeneous setup involves a master processor (the `host' CPU) offloading some easily parallelised computations to a graphics processing unit (GPU) or to a custom accelerator implemented on a field-programmable gate array (FPGA).This arrangement can benefit performance because it exploits the massively parallel natures of GPU and FPGA architectures.","PeriodicalId":254920,"journal":{"name":"Many-Core Computing: Hardware and Software","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132000602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Asieh Salehi Fathabadi, Mohammadsadegh Dalvandi, M. Butler
{"title":"Developing portable embedded software for multicore systems through formal abstraction and refinement","authors":"Asieh Salehi Fathabadi, Mohammadsadegh Dalvandi, M. Butler","doi":"10.1049/PBPC022E_CH14","DOIUrl":"https://doi.org/10.1049/PBPC022E_CH14","url":null,"abstract":"Run-time management (RTM) systems are used in embedded systems to dynamically adapt hardware performance to minimise energy consumption. An RTM system implementation is coupled with the hardware platform specifications and is implemented individually for each specific platform. A significant challenge is that RTM software can require laborious manual adjustment across different hardware platforms due to the diversity of architecture characteristics. Hardware specifications vary from one platform to another and include a number of characteristic such as the number of supported voltage and frequency (VF) settings. Formal modelling offers the potential to simplify the management of platform diversity by shifting the focus away from handwritten platform-specific code to platform-independent models from which platform-specific implementations are automatically generated. The article presents an overview of the motivations for this work. It goes on to overview the RTM architecture and requirements and introduce the Event-B formal method and its tool support. The article then describes the Event-B model of two different RTMs and presents the portability support provided by formal modelling and code generation. Finalyy, it reviews the verification and experimental results.","PeriodicalId":254920,"journal":{"name":"Many-Core Computing: Hardware and Software","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117315290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Modelling many-core architectures","authors":"Guihai Yan, Jiajun Li, L. Xiaowei","doi":"10.1049/pbpc022e_ch12","DOIUrl":"https://doi.org/10.1049/pbpc022e_ch12","url":null,"abstract":"Architectural modelling has two primary objectives: (1) navigating the design space exploration, i.e. guiding the architects to arrival at better design choices, and (2) facilitating dynamic management, i.e. providing the functional relationships between workloads'characteristics and architectural configurations to enable appropriate runtime hardware/software adaptations. In the past years, many-core architectures, as a typical computing fabric evolving from the monolithic single-/multicore architectures, have been shown to be scalable to uphold the staggering the Moore's Law. The many-core architectures enable two orthogonal approaches, scale-up and scale-out, to utilize the growing budget of transistors. Understanding the rationale behind these approaches is critical to make more efficient use of the powerful computing fabric.","PeriodicalId":254920,"journal":{"name":"Many-Core Computing: Hardware and Software","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126393657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}