Proceedings of the ACM International Conference on Computing Frontiers最新文献

筛选
英文 中文
A non von neumann continuum computer architecture for scalability beyond Moore's law 一个非冯·诺伊曼连续体计算机体系结构的可扩展性超越摩尔定律
Proceedings of the ACM International Conference on Computing Frontiers Pub Date : 2016-05-16 DOI: 10.1145/2903150.2903486
M. Brodowicz, T. Sterling
{"title":"A non von neumann continuum computer architecture for scalability beyond Moore's law","authors":"M. Brodowicz, T. Sterling","doi":"10.1145/2903150.2903486","DOIUrl":"https://doi.org/10.1145/2903150.2903486","url":null,"abstract":"A strategic challenge confronting the continued advance of high performance computing (HPC) to extreme scale is the approaching near-nanoscale semiconductor technology and the end of Moore's Law. This paper introduces the foundations of an innovative class of parallel architecture reversing many of the conventional architecture directions, but benefiting from substantial prior art of previous decades. The Continuum Computer Architecture, or CCA, eschews traditional von Neumann-derived processing logic, instead employing structures composed of fine-grain cells (fontons) that combine functional units, memory, and network. The paper describes how CCA systems of various scales may be organized and implemented using currently available technology. As programming of such systems substantially differs from established practices, a still experimental ParalleX execution model is introduced to be used as a guide for the implementation of related software stack layers, ranging from the operating system to application level constructs. Finally, the HPX-5 runtime system, an advanced implementation of ParalleX core components, is presented as an intermediate software methodology for CCA system computation resource management.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132785366","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Power and clock gating modelling in coarse grained reconfigurable systems 粗粒度可重构系统中的功率和时钟门控建模
Proceedings of the ACM International Conference on Computing Frontiers Pub Date : 2016-05-16 DOI: 10.1145/2903150.2911713
Tiziana Fanni, Carlo Sau, P. Meloni, L. Raffo, F. Palumbo
{"title":"Power and clock gating modelling in coarse grained reconfigurable systems","authors":"Tiziana Fanni, Carlo Sau, P. Meloni, L. Raffo, F. Palumbo","doi":"10.1145/2903150.2911713","DOIUrl":"https://doi.org/10.1145/2903150.2911713","url":null,"abstract":"Power reduction is one of the biggest challenges in modern systems and tends to become a severe issue dealing with complex scenarios. To provide high-performance and flexibility, designers often opt for coarse-grained reconfigurable (CGR) systems. Nevertheless, these systems require specific attention to the power problem, since large set of resources may be underutilized while computing a certain task. This paper focuses on this issue. Targeting CGR devices, we propose a way to model in advance power and clock gating costs on the basis of the functional, technological and architectural parameters of the baseline CGR system. The proposed flow guides designers towards optimal implementations, saving designer effort and time.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128285629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Optimizing sparse matrix computations through compiler-assisted programming 通过编译器辅助编程优化稀疏矩阵计算
Proceedings of the ACM International Conference on Computing Frontiers Pub Date : 2016-05-16 DOI: 10.1145/2903150.2903157
K. Rietveld, H. Wijshoff
{"title":"Optimizing sparse matrix computations through compiler-assisted programming","authors":"K. Rietveld, H. Wijshoff","doi":"10.1145/2903150.2903157","DOIUrl":"https://doi.org/10.1145/2903150.2903157","url":null,"abstract":"Existing high-performance implementations of sparse matrix codes are intricate and result in large code bases. In fact, a single floating-point operation requires 400 to 600 lines of additional code to \"prepare\" this operation. This imbalance severely obscures code development, thereby complicating maintenance and portability. In this paper, we propose a drastically different approach in order to continue to effectively handle these codes. We propose to only specify the essence of the computation on the level of individual matrix elements. All additional source code to embed these computations are then generated and optimized automatically by the compiler. This approach is far superior to existing library approaches and allows code to perform scatter/gather operations, matrix reordering, matrix data structure handling, handling of fill-in, etc., to be generated automatically. Experiments show that very efficient data structures can be generated and the resulting codes can be very competitive.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133671812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Breadth first search vectorization on the Intel Xeon Phi 基于Intel Xeon Phi处理器的宽度优先搜索矢量化
Proceedings of the ACM International Conference on Computing Frontiers Pub Date : 2016-04-11 DOI: 10.1145/2903150.2903180
Mireya Paredes, G. Riley, M. Luján
{"title":"Breadth first search vectorization on the Intel Xeon Phi","authors":"Mireya Paredes, G. Riley, M. Luján","doi":"10.1145/2903150.2903180","DOIUrl":"https://doi.org/10.1145/2903150.2903180","url":null,"abstract":"Breadth First Search (BFS) is a building block for graph algorithms and has recently been used for large scale analysis of information in a variety of applications including social networks, graph databases and web searching. Due to its importance, a number of different parallel programming models and architectures have been exploited to optimize the BFS. However, due to the irregular memory access patterns and the unstructured nature of the large graphs, its efficient parallelization is a challenge. The Xeon Phi is a massively parallel architecture available as an off-the-shelf accelerator, which includes a powerful 512 bit vector unit with optimized scatter and gather functions. Given its potential benefits, work related to graph traversing on this architecture is an active area of research. We present a set of experiments in which we explore architectural features of the Xeon Phi and how best to exploit them in a top-down BFS algorithm but the techniques can be applied to the current state-of-the-art hybrid, top-down plus bottom-up, algorithms. We focus on the exploitation of the vector unit by developing an improved highly vectorized OpenMP parallel algorithm, using vector intrinsics, and understanding the use of data alignment and prefetching. In addition, we investigate the impact of hyperthreading and thread affinity on performance, a topic that appears under researched in the literature. As a result, we achieve what we believe is the fastest published top-down BFS algorithm on the version of Xeon Phi used in our experiments. The vectorized BFS top-down source code presented in this paper can be available on request as free-to-use software.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125625040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Towards co-designed optimizations in parallel frameworks: a MapReduce case study 迈向并行框架中的协同设计优化:一个MapReduce案例研究
Proceedings of the ACM International Conference on Computing Frontiers Pub Date : 2016-03-31 DOI: 10.1145/2903150.2903162
Colin Barrett, Christos Kotselidis, M. Luján
{"title":"Towards co-designed optimizations in parallel frameworks: a MapReduce case study","authors":"Colin Barrett, Christos Kotselidis, M. Luján","doi":"10.1145/2903150.2903162","DOIUrl":"https://doi.org/10.1145/2903150.2903162","url":null,"abstract":"The explosion of Big Data was followed by the proliferation of numerous complex parallel software stacks whose aim is to tackle the challenges of data deluge. A drawback of a such multi-layered hierarchical deployment is the inability to maintain and delegate vital semantic information between layers in the stack. Software abstractions increase the semantic distance between an application and its generated code. However, parallel software frameworks contain inherent semantic information that general purpose compilers are not designed to exploit. This paper presents a case study demonstrating how the specific semantic information of the MapReduce paradigm can be exploited on multicore architectures. MR4J has been implemented in Java and evaluated against hand-optimized C and C++ equivalents. The initial observed results led to the design of a semantically aware optimizer that runs automatically without requiring modification to application code. The optimizer is able to speedup the execution time of MR4J by up to 2.0x. The introduced optimization not only improves the performance of the generated code, during the map phase, but also reduces the pressure on the garbage collector. This demonstrates how semantic information can be harnessed without sacrificing sound software engineering practices when using parallel software frameworks.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124116797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Proceedings of the ACM International Conference on Computing Frontiers ACM计算前沿国际会议论文集
{"title":"Proceedings of the ACM International Conference on Computing Frontiers","authors":"","doi":"10.1145/2903150","DOIUrl":"https://doi.org/10.1145/2903150","url":null,"abstract":"","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121403285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信