Proceedings of the 12th ACM International Conference on Computing Frontiers最新文献_第3页

Moving to memoryland: in-memory computation for existing applications 进入内存领域:现有应用程序的内存计算

Proceedings of the 12th ACM International Conference on Computing Frontiers Pub Date : 2015-05-06 DOI: 10.1145/2742854.2742874

P. Trancoso

{"title":"Moving to memoryland: in-memory computation for existing applications","authors":"P. Trancoso","doi":"10.1145/2742854.2742874","DOIUrl":"https://doi.org/10.1145/2742854.2742874","url":null,"abstract":"Migrating computation to memory was proposed a long time ago as a way to overcome the memory bandwidth and latency bottleneck, as well as increase the computation parallelism. While the concept had been applied to several research projects it is only recently that the technological hurdles have been solved and we are able to see products arriving the market. While in most cases we need to concentrate on developing new algorithms and porting applications to new models as to fully exploit the potentials of the new products, we will still want to be able to execute efficiently existing applications. As such, in this work we focus on the analysis of the in-memory computation characteristics of existing applications in a way to evaluate how we would be able to have them move to \"Memoryland\". We present a tool that analyses the locality of the memory accesses for the different routines in an application. The results observed from the execution of this tool on different applications are that while certain applications seem to be able to fit in a small granularity architecture (small memory-to-computation ratio), others have routines that require a large amount of data. Thus we believe that hierarchical in-memory processing architectures are a good fit for the demands of the different applications. In addition, results have shown that for most applications we can limit our analysis to the routines that issue the most memory accesses.","PeriodicalId":417279,"journal":{"name":"Proceedings of the 12th ACM International Conference on Computing Frontiers","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126879554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Heterogeneous energy-efficient cache design in warehouse scale computers 仓库规模计算机的异构节能缓存设计

Proceedings of the 12th ACM International Conference on Computing Frontiers Pub Date : 2015-05-06 DOI: 10.1145/2742854.2742889

Jing Wang, Xiaoyan Zhu, Yanjun Liu, Jiaqi Zhang, Minhua Wu, Wei-gong Zhang, Keni Qiu

引用次数: 0

Optimizing irregular applications for energy and performance on the Tilera many-core architecture 在Tilera多核架构上优化不规则应用程序的能源和性能

Proceedings of the 12th ACM International Conference on Computing Frontiers Pub Date : 2015-05-06 DOI: 10.1145/2742854.2742865

D. Chavarría-Miranda, Ajay Panyala, M. Halappanavar, J. Manzano, Antonino Tumeo

{"title":"Optimizing irregular applications for energy and performance on the Tilera many-core architecture","authors":"D. Chavarría-Miranda, Ajay Panyala, M. Halappanavar, J. Manzano, Antonino Tumeo","doi":"10.1145/2742854.2742865","DOIUrl":"https://doi.org/10.1145/2742854.2742865","url":null,"abstract":"Optimizing applications simultaneously for energy and performance is a complex problem. High performance, parallel, irregular applications are notoriously hard to optimize due to their data-dependent memory accesses, lack of structured locality and complex data structures and code patterns. Irregular kernels are growing in importance in applications such as machine learning, graph analytics and combinatorial scientific computing. Performance- and energy-efficient implementation of these kernels on modern, energy efficient, many-core platforms is therefore an important and challenging problem. We present results from optimizing two irregular applications -- the Louvain method for community detection (Grappolo), and high-performance conjugate gradient (HPCCG) -- on the Tilera many-core system. We have significantly extended MIT's OpenTuner auto-tuning framework to conduct a detailed study of platform-independent and platform-specific optimizations to improve performance as well as reduce total energy consumption. We explore the optimization design space along three dimensions: memory layout schemes, compiler-based code transformations, and optimization of parallel loop schedules. Using auto-tuning, we demonstrate whole-node energy savings of up to 41% relative to a baseline instantiation, and up to 31% relative to manually optimized variants.","PeriodicalId":417279,"journal":{"name":"Proceedings of the 12th ACM International Conference on Computing Frontiers","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123298555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Authentication and privacy preserving message transfer scheme for vehicular ad hoc networks (VANETs) 车载自组织网络(vanet)的身份验证和保密消息传输方案

Proceedings of the 12th ACM International Conference on Computing Frontiers Pub Date : 2015-05-06 DOI: 10.1145/2742854.2745718

Kuldeep Singh, P. Saini, S. Rani, Awadhesh Kumar Singh

引用次数: 5

Enhanced GPU-based distributed breadth first search 增强的基于gpu的分布式广度优先搜索

Proceedings of the 12th ACM International Conference on Computing Frontiers Pub Date : 2015-05-06 DOI: 10.1145/2742854.2742887

M. Bernaschi, Giancarlo Carbone, Enrico Mastrostefano, M. Bisson, M. Fatica

引用次数: 10

Fast packet forwarding engine based on software circuits 基于软件电路的快速包转发引擎

Proceedings of the 12th ACM International Conference on Computing Frontiers Pub Date : 2015-05-06 DOI: 10.1145/2742854.2742862

M. Makkes, A. Varbanescu, C. D. Laat, R. Meijer

引用次数: 0

A matrix multiplier case study for an evaluation of a configurable dataflow-machine 一个矩阵乘法器案例研究，用于可配置数据流机器的评估

Proceedings of the 12th ACM International Conference on Computing Frontiers Pub Date : 2015-05-06 DOI: 10.1145/2742854.2747287

L. Verdoscia, R. Vaccaro, R. Giorgi

引用次数: 14

Optimal allocation of virtual resources using genetic algorithm in cloud environments 云环境下基于遗传算法的虚拟资源优化分配

Proceedings of the 12th ACM International Conference on Computing Frontiers Pub Date : 2015-05-06 DOI: 10.1145/2742854.2744722

K. D. Babu, D. Kumar, Suresh Veluru

引用次数: 6

HRF: a resource allocation scheme for moldable jobs HRF:可塑作业的资源分配方案

Proceedings of the 12th ACM International Conference on Computing Frontiers Pub Date : 2015-05-06 DOI: 10.1145/2742854.2742870

Song Wu, Qiong Tuo, Hai Jin, Chuxiong Yan, Qizheng Weng

{"title":"HRF: a resource allocation scheme for moldable jobs","authors":"Song Wu, Qiong Tuo, Hai Jin, Chuxiong Yan, Qizheng Weng","doi":"10.1145/2742854.2742870","DOIUrl":"https://doi.org/10.1145/2742854.2742870","url":null,"abstract":"Moldable jobs, which allow the number of allocated processors to be adjusted before running in clusters, have attracted increasing concern in parallel job scheduling research. Compared with traditional rigid jobs where the number of allocated processors is fixed, moldable jobs are more flexible and therefore have more potential for improving their average turnaround time (a crucial metric to describe performance of jobs in a cluster). Average turnaround time of moldable jobs depends greatly on resource allocation schemes. Unfortunately, existing schemes do not perform well in reducing average turnaround time, either because they only consider a single job's turnaround time instead of the average turnaround time of all jobs, or because they just aim at fairness between short and long jobs instead of their average turnaround time. In this paper, we investigate how resource allocation affects the average turnaround time of moldable jobs in clusters, and propose a scheme named HRF (highest revenue first), which allocates processors according to the highest revenue of shortening runtime. In our simulations, experimental results show that HRF can reduce average turnaround time up to 71% when compared with state-of-the-art schemes.","PeriodicalId":417279,"journal":{"name":"Proceedings of the 12th ACM International Conference on Computing Frontiers","volume":"50 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116318304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Data access optimization in a processing-in-memory system 内存处理系统中的数据访问优化

Proceedings of the 12th ACM International Conference on Computing Frontiers Pub Date : 2015-05-06 DOI: 10.1145/2742854.2742863

Zehra Sura, A. Jacob, Tong Chen, Bryan S. Rosenburg, Olivier Sallenave, C. Bertolli, S. Antão, J. Brunheroto, Yoonho Park, K. O'Brien, R. Nair

{"title":"Data access optimization in a processing-in-memory system","authors":"Zehra Sura, A. Jacob, Tong Chen, Bryan S. Rosenburg, Olivier Sallenave, C. Bertolli, S. Antão, J. Brunheroto, Yoonho Park, K. O'Brien, R. Nair","doi":"10.1145/2742854.2742863","DOIUrl":"https://doi.org/10.1145/2742854.2742863","url":null,"abstract":"The Active Memory Cube (AMC) system is a novel heterogeneous computing system concept designed to provide high performance and power-efficiency across a range of applications. The AMC architecture includes general-purpose host processors and specially designed in-memory processors (processing lanes) that would be integrated in a logic layer within 3D DRAM memory. The processing lanes have large vector register files but no power-hungry caches or local memory buffers. Performance depends on how well the resulting higher effective memory latency within the AMC can be managed. In this paper, we describe a combination of programming language features, compiler techniques, operating system interfaces, and hardware design that can effectively hide memory latency for the processing lanes in an AMC system. We present experimental data to show how this approach improves the performance of a set of representative benchmarks important in high performance computing applications. As a result, we are able to achieve high performance together with power efficiency using the AMC architecture.","PeriodicalId":417279,"journal":{"name":"Proceedings of the 12th ACM International Conference on Computing Frontiers","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132866585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 67