IEEE Transactions on Multi-Scale Computing Systems最新文献_第5页

Bi-Objective Cost Function for Adaptive Routing in Network-on-Chip 片上网络自适应路由的双目标代价函数

IEEE Transactions on Multi-Scale Computing Systems Pub Date : 2018-02-27 DOI: 10.1109/TMSCS.2018.2810223

Asma Benmessaoud Gabis;Pierre Bomel;Marc Sevaux

引用次数: 4

Design Methodology for Responsive and Rrobust MIMO Control of Heterogeneous Multicores 异构多核响应和鲁棒MIMO控制的设计方法

IEEE Transactions on Multi-Scale Computing Systems Pub Date : 2018-02-26 DOI: 10.1109/TMSCS.2018.2808524

Tiago Mück;Bryan Donyanavard;Kasra Moazzemi;Amir M. Rahmani;Axel Jantsch;Nikil Dutt

{"title":"Design Methodology for Responsive and Rrobust MIMO Control of Heterogeneous Multicores","authors":"Tiago Mück;Bryan Donyanavard;Kasra Moazzemi;Amir M. Rahmani;Axel Jantsch;Nikil Dutt","doi":"10.1109/TMSCS.2018.2808524","DOIUrl":"https://doi.org/10.1109/TMSCS.2018.2808524","url":null,"abstract":"Heterogeneous multicore processors (HMPs) are commonly deployed to meet the performance and power requirements of emerging workloads. HMPs demand adaptive and coordinated resource management techniques to control such complex systems. While Multiple-Input-Multiple-Output (MIMO) control theory has been applied to adaptively coordinate resources for \u0000<italic>single-core</i>\u0000 processors, the coordinated management of HMPs poses significant additional challenges for achieving robustness and responsiveness, due to the unmanageable complexity of modeling the system dynamics. This paper presents, for the first time, a methodology to design robust MIMO controllers with rapid response and formal guarantees for coordinated management of HMPs. Our approach addresses the challenges of: (1) system decomposition and identification; (2) selection of suitable sensor and actuator granularity; and (3) appropriate system modeling to make the system identifiable as well as controllable. We demonstrate the practical applicability of our approach on an ARM big.LITTLE HMP platform running Linux, and demonstrate the efficiency and robustness of our method by designing MIMO-based resource managers.","PeriodicalId":100643,"journal":{"name":"IEEE Transactions on Multi-Scale Computing Systems","volume":"4 4","pages":"944-951"},"PeriodicalIF":0.0,"publicationDate":"2018-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TMSCS.2018.2808524","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68024191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Incremental Maintenance of Maximal Bicliques in a Dynamic Bipartite Graph 动态二分图中最大二元组的增量维护

IEEE Transactions on Multi-Scale Computing Systems Pub Date : 2018-02-06 DOI: 10.1109/TMSCS.2018.2802920

Apurba Das;Srikanta Tirthapura

{"title":"Incremental Maintenance of Maximal Bicliques in a Dynamic Bipartite Graph","authors":"Apurba Das;Srikanta Tirthapura","doi":"10.1109/TMSCS.2018.2802920","DOIUrl":"https://doi.org/10.1109/TMSCS.2018.2802920","url":null,"abstract":"We consider incremental maintenance of maximal bicliques from a dynamic bipartite graph that changes over time due to the addition of edges. When new edges are added to the graph, we seek to enumerate the change in the set of maximal bicliques, without enumerating the set of maximal bicliques that remain unaffected. The challenge in an efficient algorithm is to enumerate the change without explicitly enumerating the set of all maximal bicliques. In this work, we present (1) Near-tight bounds on the magnitude of change in the set of maximal bicliques of a graph, due to a change in the edge set, and an (2) Incremental algorithm for enumerating the change in the set of maximal bicliques. For the case when a constant number of edges are added to the graph, our algorithm is “change-sensitive”, i.e., its time complexity is proportional to the magnitude of change in the set of maximal bicliques. To our knowledge, this is the first incremental algorithm for enumerating maximal bicliques in a dynamic graph, with a provable performance guarantee. Our algorithm is easy to implement, and experimental results show that its performance exceeds that of baseline implementations by orders of magnitude substructures.","PeriodicalId":100643,"journal":{"name":"IEEE Transactions on Multi-Scale Computing Systems","volume":"4 3","pages":"231-242"},"PeriodicalIF":0.0,"publicationDate":"2018-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TMSCS.2018.2802920","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68026460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Docker Container Scheduler for I/O Intensive Applications Running on NVMe SSDs 适用于NVMe SSD上运行的I/O密集型应用程序的Docker Container Scheduler

IEEE Transactions on Multi-Scale Computing Systems Pub Date : 2018-02-02 DOI: 10.1109/TMSCS.2018.2801281

Janki Bhimani;Zhengyu Yang;Ningfang Mi;Jingpei Yang;Qiumin Xu;Manu Awasthi;Rajinikanth Pandurangan;Vijay Balakrishnan

{"title":"Docker Container Scheduler for I/O Intensive Applications Running on NVMe SSDs","authors":"Janki Bhimani;Zhengyu Yang;Ningfang Mi;Jingpei Yang;Qiumin Xu;Manu Awasthi;Rajinikanth Pandurangan;Vijay Balakrishnan","doi":"10.1109/TMSCS.2018.2801281","DOIUrl":"https://doi.org/10.1109/TMSCS.2018.2801281","url":null,"abstract":"By using fast back-end storage, performance benefits of a lightweight container platform can be leveraged with quick I/O response. Nevertheless, the performance of simultaneously executing multiple instances of same or different applications may vary significantly with the number of containers. The performance may also vary with the nature of applications because different applications can exhibit different nature on SSDs in terms of I/O types (read/write), I/O access pattern (random/sequential), I/O size, etc. Therefore, this paper aims to investigate and analyze the performance characterization of both homogeneous and heterogeneous mixtures of I/O intensive containerized applications, operating with high performance NVMe SSDs and derive novel design guidelines for achieving an optimal and fair operation of the both homogeneous and heterogeneous mixtures. By leveraging these design guidelines, we further develop a new docker controller for scheduling workload containers of different types of applications. Our controller decides the optimal batches of simultaneously operating containers in order to minimize total execution time and maximize resource utilization. Meanwhile, our controller also strives to balance the throughput among all simultaneously running applications. We develop this new docker controller by solving an optimization problem using five different optimization solvers. We conduct our experiments in a platform of multiple docker containers operating on an array of three enterprise NVMe drives. We further evaluate our controller using different applications of diverse I/O behaviors and compare it with simultaneous operation of containers without the controller. Our evaluation results show that our new docker workload controller helps speed-up the overall execution of multiple applications on SSDs.","PeriodicalId":100643,"journal":{"name":"IEEE Transactions on Multi-Scale Computing Systems","volume":"4 3","pages":"313-326"},"PeriodicalIF":0.0,"publicationDate":"2018-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TMSCS.2018.2801281","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68023878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 34

Application-Arrival Rate Aware Distributed Run-Time Resource Management for Many-Core Computing Platforms 基于应用到达率的多核心计算平台分布式运行时资源管理

IEEE Transactions on Multi-Scale Computing Systems Pub Date : 2018-02-02 DOI: 10.1109/TMSCS.2018.2793189

Vasileios Tsoutsouras;Sotirios Xydis;Dimitrios Soudris

{"title":"Application-Arrival Rate Aware Distributed Run-Time Resource Management for Many-Core Computing Platforms","authors":"Vasileios Tsoutsouras;Sotirios Xydis;Dimitrios Soudris","doi":"10.1109/TMSCS.2018.2793189","DOIUrl":"https://doi.org/10.1109/TMSCS.2018.2793189","url":null,"abstract":"Modern many-core computing platforms execute a diverse set of dynamic workloads in the presence of varying application arrival rates. This inflicts strict requirements on run-time management to efficiently allocate system resources. On the way towards kilo-core processor architectures, centralized resource management approaches will most probably form a severe performance bottleneck, thus focus has been turned to the study of Distributed Run-Time Resource Management (DRTRM) schemes. In this article, we examine the behavior of a DRTRM of dynamic applications with malleable characteristics against stressing incoming application interval rate scenarios, using Intel SCC as the target many-core system. We show that resource allocation is highly affected by application input rate and propose an application-arrival aware DRTRM framework implementing an effective admission control strategy by carefully utilizing voltage and frequency scaling on parts of its resource allocation infrastructure. Through extensive experimental evaluation, we quantitatively analyze the behavior of the introduced DRTRM scheme and show that it achieves up to 44 percent performance gains while consuming 31 percent less energy, in comparison to a state-of-art DRTRM solution. In comparison to a centralized RTRM, the respective metric values rise up to 62 and 45 percent performance and energy gains, respectively.","PeriodicalId":100643,"journal":{"name":"IEEE Transactions on Multi-Scale Computing Systems","volume":"4 3","pages":"285-298"},"PeriodicalIF":0.0,"publicationDate":"2018-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TMSCS.2018.2793189","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68023985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Multilevel Parallelism for the Exploration of Large-Scale Graphs 探索大尺度图的多级并行性

IEEE Transactions on Multi-Scale Computing Systems Pub Date : 2018-01-23 DOI: 10.1109/TMSCS.2018.2797195

Massimo Bernaschi;Mauro Bisson;Enrico Mastrostefano;Flavio Vella

引用次数: 9

Scalable and Performant Graph Processing on GPUs Using Approximate Computing 基于近似计算的GPU可伸缩性能图处理

IEEE Transactions on Multi-Scale Computing Systems Pub Date : 2018-01-22 DOI: 10.1109/TMSCS.2018.2795543

Somesh Singh;Rupesh Nasre

引用次数: 9

Speedup and Power Scaling Models for Heterogeneous Many-Core Systems 异构多核心系统的加速和功率缩放模型

IEEE Transactions on Multi-Scale Computing Systems Pub Date : 2018-01-12 DOI: 10.1109/TMSCS.2018.2791531

Ashur Rafiev;Mohammed A. N. Al-Hayanni;Fei Xia;Rishad Shafik;Alexander Romanovsky;Alex Yakovlev

{"title":"Speedup and Power Scaling Models for Heterogeneous Many-Core Systems","authors":"Ashur Rafiev;Mohammed A. N. Al-Hayanni;Fei Xia;Rishad Shafik;Alexander Romanovsky;Alex Yakovlev","doi":"10.1109/TMSCS.2018.2791531","DOIUrl":"https://doi.org/10.1109/TMSCS.2018.2791531","url":null,"abstract":"Traditional speedup models, such as Amdahl's law, Gustafson's, and Sun and Ni's, have helped the research community and industry better understand system performance capabilities and application parallelizability. As they mostly target homogeneous hardware platforms or limited forms of processor heterogeneity, these models do not cover newly emerging multi-core heterogeneous architectures. This paper reports on novel speedup and energy consumption models based on a more general representation of heterogeneity, referred to as the normal form heterogeneity, that supports a wide range of heterogeneous many-core architectures. The modelling method aims to predict system power efficiency and performance ranges, and facilitates research and development at the hardware and system software levels. The models were validated through extensive experimentation on the off-the-shelf big. LITTLE heterogeneous platform and a dual-GPU laptop, with an average error of 1 percent for speedup and of less than 6.5 percent for power dissipation. A quantitative efficiency analysis targeting the system load balancer on the Odroid XU3 platform was used to demonstrate the practical use of the method.","PeriodicalId":100643,"journal":{"name":"IEEE Transactions on Multi-Scale Computing Systems","volume":"4 3","pages":"436-449"},"PeriodicalIF":0.0,"publicationDate":"2018-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TMSCS.2018.2791531","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68026458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

$mathsf{CHOAMP}$ : Cost Based Hardware Optimization for Asymmetric Multicore Processors $mathsf｛CHOAMP｝$：基于成本的非对称多核处理器硬件优化

IEEE Transactions on Multi-Scale Computing Systems Pub Date : 2018-01-11 DOI: 10.1109/TMSCS.2018.2791955

Jyothi Krishna Viswakaran Sreelatha;Shankar Balachandran;Rupesh Nasre

{"title":"$mathsf{CHOAMP}$ : Cost Based Hardware Optimization for Asymmetric Multicore Processors","authors":"Jyothi Krishna Viswakaran Sreelatha;Shankar Balachandran;Rupesh Nasre","doi":"10.1109/TMSCS.2018.2791955","DOIUrl":"https://doi.org/10.1109/TMSCS.2018.2791955","url":null,"abstract":"Heterogeneous Multiprocessors (HMPs) are popular due to their energy efficiency over Symmetric Multicore Processors (SMPs). Asymmetric Multicore Processors (AMPs) are a special case of HMPs where different kinds of cores share the same instruction set, but offer different power-performance trade-offs. Due to the computational-power difference between these cores, finding an optimal hardware configuration for executing a given parallel program is quite challenging. An inherent difficulty in this problem stems from the fact that the original program is written for SMPs. This challenge is exacerbated by the interplay of several configuration parameters that are allowed to be changed in AMPs. In this work, we propose a probabilistic method named CHOAMP to choose the bestavailable hardware configuration for a given parallel program. Selection of a configuration is guided by a user-provided run-time property such as energy-delay-product (EDP) and CHOAMP aspires to optimize the property in choosing a configuration. The core part of our probabilistic method relies on identifying the behavior of various program constructs in different classes of CPU cores in the AMP, and how it influences the cost function of choice. We implement the proposed technique in a compiler which automatically transforms a code optimized for SMP to run efficiently over an AMP, eliding requirement of any user annotations. CHOAMP transforms the same source program for different hardware configurations based on different user requirement. We evaluate the efficiency of our method for three different run-time properties: execution time, energy consumption, and EDP, in NAS Parallel Benchmarks for OpenMP. Our experimental evaluation shows that CHOAMP achieves an average of 65, 28, and 57 percent improvement over baseline HMP scheduling while optimizing for energy, execution time, and EDP, respectively.","PeriodicalId":100643,"journal":{"name":"IEEE Transactions on Multi-Scale Computing Systems","volume":"4 2","pages":"163-176"},"PeriodicalIF":0.0,"publicationDate":"2018-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TMSCS.2018.2791955","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68021417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

Execution Trace Graph of Dataflow Process Networks 数据流过程网络的执行跟踪图

IEEE Transactions on Multi-Scale Computing Systems Pub Date : 2018-01-08 DOI: 10.1109/TMSCS.2018.2790921

Simone Casale-Brunet;Marco Mattavelli

{"title":"Execution Trace Graph of Dataflow Process Networks","authors":"Simone Casale-Brunet;Marco Mattavelli","doi":"10.1109/TMSCS.2018.2790921","DOIUrl":"https://doi.org/10.1109/TMSCS.2018.2790921","url":null,"abstract":"The paper introduces and specifies a formalism that provides complete representations of dataflow process network (DPN) program executions, by means of directed acyclic graphs. Such graphs, also known as execution trace graphs (ETG), are composed of nodes representing each action firing and by directed arcs representing the dataflow program execution constraints between two action firings. Action firings are atomic operations that encompass the algorithmic part of the action executions applied to both, the input data and the actor state variables. The paper describes how an ETG can be effectively derived from a dataflow program, specifies the type of dependencies that need to be included, and the processing that need to be applied so that an ETG become capable of representing all the admissible trajectories that dynamic dataflow programs can execute. The paper also describes how some characteristics of the ETG, related to specific implementations of the dataflow program, can be evaluated by means of high-level and architecture-independent executions of the program. Furthermore, some examples are provided showing how the analysis of the ETGs can support efficient explorations, reductions, and optimizations of the design space, providing results in terms of design alternatives, without requiring any partial implementation or reduction of the expressiveness of the original DPN dataflow program.","PeriodicalId":100643,"journal":{"name":"IEEE Transactions on Multi-Scale Computing Systems","volume":"4 3","pages":"340-354"},"PeriodicalIF":0.0,"publicationDate":"2018-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TMSCS.2018.2790921","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68023879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1