Proceedings of the ACM International Conference on Computing Frontiers最新文献_第6页

Shared resource aware scheduling on power-constrained tiled many-core processors 在功率受限的平铺多核处理器上实现共享资源感知调度

Proceedings of the ACM International Conference on Computing Frontiers Pub Date : 2016-05-16 DOI: 10.1145/2903150.2903490

S. S. Jha, W. Heirman, Ayose Falcón, Jordi Tubella, Antonio González, L. Eeckhout

引用次数: 11

CryoCMOS hardware technology a classical infrastructure for a scalable quantum computer CryoCMOS硬件技术是可扩展量子计算机的经典基础设施

Proceedings of the ACM International Conference on Computing Frontiers Pub Date : 2016-05-16 DOI: 10.1145/2903150.2906828

H. Homulle, Stefan Visser, B. Patra, G. Ferrari, E. Prati, C. G. Almudever, K. Bertels, F. Sebastiano, E. Charbon

引用次数: 9

InfiniCortex: present and future invited paper InfiniCortex:现在和未来的特邀论文

Proceedings of the ACM International Conference on Computing Frontiers Pub Date : 2016-05-16 DOI: 10.1145/2903150.2912887

M. Michalewicz, T. Lian, Lim Seng, Jonathan Low, D. Southwell, Jason Gunthorpe, Gabriel Noaje, Dominic Chien, Yves Poppe, Jakub Chrzeszczyk, Andrew Howard, Tin Wee Tan, Sing-Wu Liou

{"title":"InfiniCortex: present and future invited paper","authors":"M. Michalewicz, T. Lian, Lim Seng, Jonathan Low, D. Southwell, Jason Gunthorpe, Gabriel Noaje, Dominic Chien, Yves Poppe, Jakub Chrzeszczyk, Andrew Howard, Tin Wee Tan, Sing-Wu Liou","doi":"10.1145/2903150.2912887","DOIUrl":"https://doi.org/10.1145/2903150.2912887","url":null,"abstract":"Commencing in June 2014, A*STAR Computational Resource Centre (A*CRC) team in Singapore, together with dozens of partners world-wide, have been building the InfiniCortex. Four concepts are integrated together to realise InfiniCortex: i) High bandwidth (~ 10 to 100Gbps) intercontinental connectivity between four continents: Asia, North America, Australia and Europe; ii) InfiniBand extension technology supporting transcontinental distances using Obsidian's Longbow range extenders; iii) Connecting separate InfiniBand sub-nets with different net topologies to create a single computational resource: Galaxy of Supercomputers [10] iv) Running workflows and applications on such a distributed computational infrastructure. We have successfully demonstrated InfiniCortex prototypes at SC14 and SC15 conferences. The infrastructure comprised of computing resources residing at multiple locations in Singapore, Japan, Australia, USA, Canada, France and Poland. Various concurrent applications, including workflows, I/O heavy applications enabled with ADIOS system, Extempore real-time interactive applications, and in-situ realtime visualisations were demonstrated. In this paper we briefly report on basic ideas behind Infini-Cortex construct, our recent successes and some ideas about further growth and extension of this project.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130869388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Lock-based synchronization for GPU architectures GPU架构的基于锁的同步

Proceedings of the ACM International Conference on Computing Frontiers Pub Date : 2016-05-16 DOI: 10.1145/2903150.2903155

Yunlong Xu, Lan Gao, Rui Wang, Zhongzhi Luan, Weiguo Wu, D. Qian

{"title":"Lock-based synchronization for GPU architectures","authors":"Yunlong Xu, Lan Gao, Rui Wang, Zhongzhi Luan, Weiguo Wu, D. Qian","doi":"10.1145/2903150.2903155","DOIUrl":"https://doi.org/10.1145/2903150.2903155","url":null,"abstract":"Modern GPUs have shown promising results in accelerating compute-intensive and numerical workloads with limited data sharing. However, emerging GPU applications manifest ample amount of data sharing among concurrently executing threads. Often data sharing requires mutual exclusion mechanism to ensure data integrity in multithreaded environment. Although modern GPUs provide atomic primitives that can be leveraged to construct fine-grained locks, the existing GPU lock implementations either incur frequent concurrency bugs, or lead to extremely low hardware utilization due to the Single Instruction Multiple Threads (SIMT) execution paradigm of GPUs. To make more applications with data sharing benefit from GPU acceleration, we propose a new locking scheme for GPU architectures. The proposed locking scheme allows lock stealing within individual warps to avoid the concurrency bugs due to the SMIT execution of GPUs. Moreover, it adopts lock virtualization to reduce the memory cost of fine-grain GPU locks. To illustrate the usage and the benefit of GPU locks, we apply the proposed GPU locking scheme to Delaunay mesh refinement (DMR), an application involving massive data sharing among threads. Our lock-based implementation can achieve 1.22x speedup over an algorithmic optimization based implementation (which uses a synchronization mechanism tailored for DMR) with 94% less memory cost.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133073720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 24

Malevolent app pairs: an Android permission overpassing scheme 恶意应用对:Android权限跨越方案

Proceedings of the ACM International Conference on Computing Frontiers Pub Date : 2016-05-16 DOI: 10.1145/2903150.2911706

Antonios Dimitriadis, P. Efraimidis, Vasilios Katos

引用次数: 9

Power and clock gating modelling in coarse grained reconfigurable systems 粗粒度可重构系统中的功率和时钟门控建模

Proceedings of the ACM International Conference on Computing Frontiers Pub Date : 2016-05-16 DOI: 10.1145/2903150.2911713

Tiziana Fanni, Carlo Sau, P. Meloni, L. Raffo, F. Palumbo

引用次数: 14

Energy reduction in video systems: the GreenVideo project 视频系统节能:绿色视频项目

Proceedings of the ACM International Conference on Computing Frontiers Pub Date : 2016-05-16 DOI: 10.1145/2903150.2911716

M. Pelcat, Erwan Nogues, X. Ducloux

{"title":"Energy reduction in video systems: the GreenVideo project","authors":"M. Pelcat, Erwan Nogues, X. Ducloux","doi":"10.1145/2903150.2911716","DOIUrl":"https://doi.org/10.1145/2903150.2911716","url":null,"abstract":"With the current progress in microelectronics and the constant increase of network bandwidth, video applications are becoming ubiquitous and spread especially in the context of mobility. In 2019, 80% of the worldwide Internet traffic will be video. Nevertheless, optimizing the energy consumption for video processing is still a challenge due to the large amount of processed data. This talk will concentrate on the energy optimization of video codecs. In the first part, the Green Metadata initiative will be presented. In November 2014, MPEG released a new standard, named Green Metadata that fosters energy-efficient media on consumer devices. This standard specifies metadata to be transmitted between encoder and decoder for reducing power consumption during encoding, decoding and display. The different metadata considered in the standard will be presented. More specifically, the Green Adaptive Streaming proposition will be detailed. In the second part, the energy optimization of an HEVC decoder implemented on a modern MP-SoC will be presented. The different techniques used to implement efficiently an HEVC decoder on a general-purpose processor (GPP) will be detailed. Different levels of parallelism have been exploited to increase and exploit slack time. A sophisticated DVFS mechanism has been developed to handle the variability of the decoding process for each frame. To get further energy gains, the concept of approximate computing is exploited to propose a modified HEVC decoder capable of tuning its energy gains while managing the decoding quality versus energy trade-off. The work detailed in this second part of the talk is the result of the french GreenVideo FUI project.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134501493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

A non von neumann continuum computer architecture for scalability beyond Moore's law 一个非冯·诺伊曼连续体计算机体系结构的可扩展性超越摩尔定律

Proceedings of the ACM International Conference on Computing Frontiers Pub Date : 2016-05-16 DOI: 10.1145/2903150.2903486

M. Brodowicz, T. Sterling

{"title":"A non von neumann continuum computer architecture for scalability beyond Moore's law","authors":"M. Brodowicz, T. Sterling","doi":"10.1145/2903150.2903486","DOIUrl":"https://doi.org/10.1145/2903150.2903486","url":null,"abstract":"A strategic challenge confronting the continued advance of high performance computing (HPC) to extreme scale is the approaching near-nanoscale semiconductor technology and the end of Moore's Law. This paper introduces the foundations of an innovative class of parallel architecture reversing many of the conventional architecture directions, but benefiting from substantial prior art of previous decades. The Continuum Computer Architecture, or CCA, eschews traditional von Neumann-derived processing logic, instead employing structures composed of fine-grain cells (fontons) that combine functional units, memory, and network. The paper describes how CCA systems of various scales may be organized and implemented using currently available technology. As programming of such systems substantially differs from established practices, a still experimental ParalleX execution model is introduced to be used as a guide for the implementation of related software stack layers, ranging from the operating system to application level constructs. Finally, the HPX-5 runtime system, an advanced implementation of ParalleX core components, is presented as an intermediate software methodology for CCA system computation resource management.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132785366","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Optimizing sparse matrix computations through compiler-assisted programming 通过编译器辅助编程优化稀疏矩阵计算

Proceedings of the ACM International Conference on Computing Frontiers Pub Date : 2016-05-16 DOI: 10.1145/2903150.2903157

K. Rietveld, H. Wijshoff

引用次数: 5

Big data analytics and the LHC 大数据分析和LHC

Proceedings of the ACM International Conference on Computing Frontiers Pub Date : 2016-05-16 DOI: 10.1145/2903150.2917755

M. Girone

{"title":"Big data analytics and the LHC","authors":"M. Girone","doi":"10.1145/2903150.2917755","DOIUrl":"https://doi.org/10.1145/2903150.2917755","url":null,"abstract":"The Large Hadron Collider is one of the largest and most complicated pieces of scientific apparatus ever constructed. The detectors along the LHC ring see as many as 800 million proton-proton collisions per second. An event in 10 to the 11th power is new physics and there is a hierarchical series of steps to extract a tiny signal from an enormous background. High energy physics (HEP) has long been a driver in managing and processing enormous scientific datasets and the largest scale high throughput computing centers. HEP developed one of the first scientific computing grids that now regularly operates 500k processor cores and half of an exabyte of disk storage located on 5 continents including hundred of connected facilities. In this presentation I will discuss the techniques used to extract scientific discovery from a large and complicated dataset. While HEP has developed many tools and techniques for handling big datasets, there is an increasing desire within the field to make more effective use of additional industry developments. I will discuss some of the ongoing work to adopt industry techniques in big data analytics to improve the discovery potential of the LHC and the effectiveness of the scientists who work on it.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114168928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1