2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)最新文献

筛选
英文 中文
MLNoC: A Machine Learning Based Approach to NoC Design MLNoC:基于机器学习的NoC设计方法
N. Rao, Akshay Ramachandran, Amish Shah
{"title":"MLNoC: A Machine Learning Based Approach to NoC Design","authors":"N. Rao, Akshay Ramachandran, Amish Shah","doi":"10.1109/CAHPC.2018.8645914","DOIUrl":"https://doi.org/10.1109/CAHPC.2018.8645914","url":null,"abstract":"Modern System on Chips (SoCs) are becoming increasingly complex with a growing number of CPUs, caches, accelerators, memory and I/O subsystems. For such designs, a packet based distributed networks-on-chip (NoCs) interconnect can provide scalability, performance and efficiency. However, the design of such a NoC involves optimizing a large number of variables such as topology, routing choices, arbitration and quality of service (QoS) policies, buffer sizes, and deadlock avoidance policies. Widely varying die sizes, power, floorplan and performance constraints across a variety of different market segments, ranging from high-end servers to low-end IoT devices, impose additional design challenges. In this paper we demonstrate that there is a strong correlation between SoC characteristics and good NoC design practices. However this correlation is highly non-linear and multidimensional, with dimensions indicative of the features of the SoC, design goals and properties of the NoC. This results in a high-dimensional NoC design space and complex search process which is inefficient to solve with classic algorithms. Using a variety of real SoCs and training data sets, we demonstrate that a machine learning (ML) based approach yields near-optimal NoC designs quickly. We determine a number of SoC and NoC features, describe reduction methods, and also show that a multi-model approach yields better designs. We demonstrate that for a wide variety of SoCs, ML based NoC designs are far superior to those designed and optimized manually over years on almost all quality metrics.","PeriodicalId":307747,"journal":{"name":"2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125498329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Exploiting Limited Access Distance for Kernel Fusion Across the Stages of Explicit One-Step Methods on GPUs 利用有限访问距离实现gpu上的显式一步法跨阶段核融合
Matthias Korch, Tim Werner
{"title":"Exploiting Limited Access Distance for Kernel Fusion Across the Stages of Explicit One-Step Methods on GPUs","authors":"Matthias Korch, Tim Werner","doi":"10.1109/CAHPC.2018.8645892","DOIUrl":"https://doi.org/10.1109/CAHPC.2018.8645892","url":null,"abstract":"The performance of explicit parallel methods solving large systems of ordinary differential equations (ODEs) on GPUs is often memory bound. Therefore, locality optimizations, such as kernel fusion, are desirable. This paper exploits a special property of a large class of right-hand-side (RHS) functions to enable the fusion of computations of blocks of components across multiple stages of the method. This leads to a tiling of the stages within one time step. Our approach is based on a representation of the ODE method by a data flow graph and allows efficient GPU code with fused kernels to be generated automatically for user-defined tilings. In particular, we investigate two generalized tiling strategies, trapezoidal and hexagonal tiling, which are evaluated experimentally for several different high-order Runge-Kutta (RK) methods.","PeriodicalId":307747,"journal":{"name":"2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126552950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Optimization of a Sparse Grid-Based Data Mining Kernel for Architectures Using AVX-512 基于AVX-512架构的稀疏网格数据挖掘内核优化
Paul-Cristian Sarbu, H. Bungartz
{"title":"Optimization of a Sparse Grid-Based Data Mining Kernel for Architectures Using AVX-512","authors":"Paul-Cristian Sarbu, H. Bungartz","doi":"10.1109/CAHPC.2018.8645913","DOIUrl":"https://doi.org/10.1109/CAHPC.2018.8645913","url":null,"abstract":"Sparse grids have already been successfully used in various high-performance computing (HPC) applications, including data mining. In this article, we take a legacy classification kernel previously optimized for the AVX2 instruction set and investigate the benefits of using the newer AVX-S12-based multi-and many-core architectures. In particular, the Knights Landing (KNL) processor is used to study the possible performance gains of the code. Not all kernels benefit equally from such architectures, therefore choices in optimization steps and KNL cluster and memory modes need to be filtered through the lens of the code implementation at hand. With a less traditional approach of manual vectorization through instruction-level intrinsics, our kernel provides a differently faceted look into the optimization process. Observations stem from results obtained for node-and cluster-level classification simulations with up to 2^28 multidimensional training data points, using the CooLMUC-3cluster of the Leibniz Supercomputing Center (LRZ) in Garching, Germany.","PeriodicalId":307747,"journal":{"name":"2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131188887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Exploring the Potential of Next Generation Software-Defined in Memory Frameworks 探索下一代软件定义内存框架的潜力
Shouwei Chen, I. Rodero
{"title":"Exploring the Potential of Next Generation Software-Defined in Memory Frameworks","authors":"Shouwei Chen, I. Rodero","doi":"10.1109/CAHPC.2018.8645858","DOIUrl":"https://doi.org/10.1109/CAHPC.2018.8645858","url":null,"abstract":"As in-memory data analytics become increasingly important in a wide range of domains, the ability to develop large-scale and sustainable platforms faces significant challenges related to storage latency and memory size constraints. These challenges can be resolved by adopting new and effective formulations and novel architectures such as software-defined infrastructure. This paper investigates the key issue of data persistency for in-memory processing systems by evaluating persistence methods using different storage and memory devices for Apache Spark and the use of Alluxio. It also proposes and evaluates via simulation a Spark execution model for using disaggregated off-rack memory and non-volatile memory targeting next-generation software-defined infrastructure. Experimental results provide better understanding of behaviors and requirements for improving data persistence in current in-memory systems and provide data points to better understand requirements and design choices for next-generation software-defined infrastructure. The findings indicate that in-memory processing systems can benefit from ongoing software-defined infrastructure implementations; however current frameworks need to be enhanced appropriately to run efficiently at scale.","PeriodicalId":307747,"journal":{"name":"2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129638747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring Self-Adaptivity Towards Performance and Energy for Time-Stepping Methods 探索时间步进方法对性能和能量的自适应性
Natalia Kalinnik, R. Kiesel, T. Rauber, Marcel Richter, G. Rünger
{"title":"Exploring Self-Adaptivity Towards Performance and Energy for Time-Stepping Methods","authors":"Natalia Kalinnik, R. Kiesel, T. Rauber, Marcel Richter, G. Rünger","doi":"10.1109/CAHPC.2018.8645887","DOIUrl":"https://doi.org/10.1109/CAHPC.2018.8645887","url":null,"abstract":"Time-stepping simulation methods offer potential for self-adaptivity, since the first time steps of the simulation can be used to explore the hardware characteristics and measure which of several available implementation variants leads to a good performance and energy consumption on the given hardware platform. The version with the best performance or the smallest energy consumption can then be used for the remaining time steps. However, the number of variants to test may be quite large and different simulation methods may require different approaches for self-adaptivity. In this article, we explore the potential for self-adaptivity of several methods from scientific computing. In particular, we consider particle simulation methods, solution methods for differential equations, as well as sparse matrix computations and explore the potential for self-adaptivity of these methods, considering both performance and energy consumption as target function.","PeriodicalId":307747,"journal":{"name":"2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130499319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
DOACROSS Parallelization Based on Component Annotation and Loop-Carried Probability 基于组件标注和循环携带概率的DOACROSS并行化
Luis Mattos, D. C. S. Lucas, Juan Salamanca, J. P. L. Carvalho, M. Pereira, G. Araújo
{"title":"DOACROSS Parallelization Based on Component Annotation and Loop-Carried Probability","authors":"Luis Mattos, D. C. S. Lucas, Juan Salamanca, J. P. L. Carvalho, M. Pereira, G. Araújo","doi":"10.1109/CAHPC.2018.8645904","DOIUrl":"https://doi.org/10.1109/CAHPC.2018.8645904","url":null,"abstract":"Although modern compilers implement many loop parallelization techniques, their application is typically restricted to loops that have no loop-carried dependences (DOALL) or that contain well-known structured dependence patterns (e.g. reduction). These restrictions preclude the parallelization of many computational intensive DOACROSS loops. In such loops, either the compiler finds at least one loop-carried dependence or it cannot prove, at compile-time, that the loop is free of such dependences, even though they might never show-up at runtime. In any case, most compilers end-up not parallelizing DOACROSS loops. This paper brings three contributions to address this problem. First, it integrates three algorithms (TLS, DOAX, and BDX) into a simple openMP clause that enables the programmer to select the best algorithm for a given loop. Second, it proposes an annotation approach to separate the sequential components of a loop, thus exposing other components to parallelization. Finally, it shows that loop-carried probability is an effective metric to decide when to use TLS or other non-speculative techniques (e.g. DOAX or BDX) to parallelize DOACROSS loops. Experimental results reveal that, for certain loops, slow-downs can be transformed in 2×speed-ups by quickly selecting the appropriate algorithm.","PeriodicalId":307747,"journal":{"name":"2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126405858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Assessing Time Predictability Features of ARM Big. LITTLE Multicores 评估ARM Big的时间可预测性特征。小多核
Gabriel Fernandez, F. Cazorla, J. Abella, Sylvain Girbal
{"title":"Assessing Time Predictability Features of ARM Big. LITTLE Multicores","authors":"Gabriel Fernandez, F. Cazorla, J. Abella, Sylvain Girbal","doi":"10.1109/CAHPC.2018.8645925","DOIUrl":"https://doi.org/10.1109/CAHPC.2018.8645925","url":null,"abstract":"The increasing performance needs in critical realtime embedded systems (CRTES), such as for instance the automotive domain, push for the adoption of high-performance hardware from the consumer electronics domain. However, their time-predictability features are quite unexplored. The ARM big. LITTLE architecture is a good candidate for adoption in the CRTES market (i.e. in the automotive market it has already started being used). In this paper we study ARM big. LITTLE's capabilities to meet CRTES requirements. In particular, we perform a qualitative and quantitative assessment of its timing characteristics, focusing on shared multicore resources, and how this architecture can be reliably used in CRTES.","PeriodicalId":307747,"journal":{"name":"2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","volume":"365 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113998304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Performance Comparison of a Parallel Recommender Algorithm Across Three Hadoop-Based Frameworks 一种基于hadoop的并行推荐算法的性能比较
Christina Diedhiou, Bryan Carpenter, A. Shafi, Soumabha Sarkar, Ramazan Esmeli, Ryan Gadsdon
{"title":"Performance Comparison of a Parallel Recommender Algorithm Across Three Hadoop-Based Frameworks","authors":"Christina Diedhiou, Bryan Carpenter, A. Shafi, Soumabha Sarkar, Ramazan Esmeli, Ryan Gadsdon","doi":"10.1109/CAHPC.2018.8645926","DOIUrl":"https://doi.org/10.1109/CAHPC.2018.8645926","url":null,"abstract":"One of the challenges our society faces is the ever increasing amount of data. Among existing platforms that address the system requirements, Hadoop is a framework widely used to store and analyze “big data”. On the human side, one of the aids to finding the things people really want is recommendation systems. This paper evaluates highly scalable parallel algorithms for recommendation systems with application to very large data sets. A particular goal is to evaluate an open source Java message passing library for parallel computing called MPJ Express, which has been integrated with Hadoop. As a demonstration we use MPJ Express to implement collaborative filtering on various data sets using the algorithm ALSWR (Alternating-Least-Squares with Weighted-λ-Regularization). We benchmark the performance and demonstrate parallel speedup on Movielens and Yahoo Music data sets, comparing our results with two other frameworks: Mahout and Spark. Our results indicate that MPJ Express implementation of ALSWR has very competitive performance and scalability in comparison with the two other frameworks.","PeriodicalId":307747,"journal":{"name":"2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121762575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Adaptive Partitioning for Iterated Sequences of Irregular OpenCL Kernels 不规则OpenCL核迭代序列的自适应划分
Pierre Huchant, Denis Barthou, M. Counilh
{"title":"Adaptive Partitioning for Iterated Sequences of Irregular OpenCL Kernels","authors":"Pierre Huchant, Denis Barthou, M. Counilh","doi":"10.1109/SBAC-PAD.2018.00051","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2018.00051","url":null,"abstract":"OpenCL defines a common parallel programming language for all devices, although writing tasks adapted to the devices, managing communication and load-balancing issues are left to the programmer. We propose in this paper a static/dynamic approach for the execution of an iterated sequence of data-dependent kernels on a multi-device heterogeneous architecture. The method allows to automatically distribute irregular kernels onto multiple devices and tackles, without training, both load balancing and data transfers issues coming from hardware heterogeneity, load imbalance within the application itself and load variations between repeated executions of the sequence.","PeriodicalId":307747,"journal":{"name":"2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","volume":"194 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123365692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Network-Aware Energy-Efficient Virtual Machine Management in Distributed Cloud Infrastructures with On-Site Photovoltaic Production 分布式云基础设施中具有网络感知的节能虚拟机管理
Benjamin Camus, F. Dufossé, A. Blavette, M. Quinson, Anne-Cécile Orgerie
{"title":"Network-Aware Energy-Efficient Virtual Machine Management in Distributed Cloud Infrastructures with On-Site Photovoltaic Production","authors":"Benjamin Camus, F. Dufossé, A. Blavette, M. Quinson, Anne-Cécile Orgerie","doi":"10.1109/CAHPC.2018.8645901","DOIUrl":"https://doi.org/10.1109/CAHPC.2018.8645901","url":null,"abstract":"Distributed Clouds are nowadays an essential component for providing Internet services to always more numerous connected devices. This growth leads the energy consumption of these distributed infrastructures to be a worrying environmental and economic concern. In order to reduce energy costs and carbon footprint, Cloud providers could resort to producing onsite renewable energy, with solar panels for instance. In this paper, we propose NEMESIS: a Network-aware Energy-efficient Management framework for distributEd cloudS Infrastructures with on-Site photovoltaic production. NEMESIS optimizes VM placement and balances VM migration and green energy consumption in Cloud infrastructure embedding geographically distributed data centers with on-site photovoltaic power supply. We use the Simgrid simulation toolbox to evaluate the energy efficiency of NEMESIS against state-of-the-art approaches.","PeriodicalId":307747,"journal":{"name":"2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123332045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信