S. S. Jha, W. Heirman, Ayose Falcón, Jordi Tubella, Antonio González, L. Eeckhout
{"title":"Shared resource aware scheduling on power-constrained tiled many-core processors","authors":"S. S. Jha, W. Heirman, Ayose Falcón, Jordi Tubella, Antonio González, L. Eeckhout","doi":"10.1145/2903150.2903490","DOIUrl":"https://doi.org/10.1145/2903150.2903490","url":null,"abstract":"Power management through dynamic core, cache and frequency adaptation is becoming a necessity in today's power-constrained many-core environments. Unfortunately, as core count grows, the complexity of both the adaptation hardware and the power management algorithms increases. In this paper, we propose a two-tier hierarchical power management methodology to exploit per-tile voltage regulators and clustered last-level caches. In addition, we include a novel thread migration layer that (i) analyzes threads running on the tiled many-core processor for shared resource sensitivity in tandem with core, cache and frequency adaptation, and (ii) co-schedules threads per tile with compatible behavior.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125755841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
H. Homulle, Stefan Visser, B. Patra, G. Ferrari, E. Prati, C. G. Almudever, K. Bertels, F. Sebastiano, E. Charbon
{"title":"CryoCMOS hardware technology a classical infrastructure for a scalable quantum computer","authors":"H. Homulle, Stefan Visser, B. Patra, G. Ferrari, E. Prati, C. G. Almudever, K. Bertels, F. Sebastiano, E. Charbon","doi":"10.1145/2903150.2906828","DOIUrl":"https://doi.org/10.1145/2903150.2906828","url":null,"abstract":"We propose a classical infrastructure for a quantum computer implemented in CMOS. The peculiarity of the approach is to operate the classical CMOS circuits and systems at deep-cryogenic temperatures (cryoCMOS), so as to ensure physical proximity to the quantum bits, thus reducing thermal gradients and increasing compactness. CryoCMOS technology leverages the CMOS fabrication infrastructure and exploits the continuous effort of miniaturization that has sustained Moore's Law for over 50 years. Such approach is believed to enable the growth of the number of qubits operating in a fault-tolerant fashion, paving the way to scalable quantum computing machines.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":"6 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130012690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Michalewicz, T. Lian, Lim Seng, Jonathan Low, D. Southwell, Jason Gunthorpe, Gabriel Noaje, Dominic Chien, Yves Poppe, Jakub Chrzeszczyk, Andrew Howard, Tin Wee Tan, Sing-Wu Liou
{"title":"InfiniCortex: present and future invited paper","authors":"M. Michalewicz, T. Lian, Lim Seng, Jonathan Low, D. Southwell, Jason Gunthorpe, Gabriel Noaje, Dominic Chien, Yves Poppe, Jakub Chrzeszczyk, Andrew Howard, Tin Wee Tan, Sing-Wu Liou","doi":"10.1145/2903150.2912887","DOIUrl":"https://doi.org/10.1145/2903150.2912887","url":null,"abstract":"Commencing in June 2014, A*STAR Computational Resource Centre (A*CRC) team in Singapore, together with dozens of partners world-wide, have been building the InfiniCortex. Four concepts are integrated together to realise InfiniCortex: i) High bandwidth (~ 10 to 100Gbps) intercontinental connectivity between four continents: Asia, North America, Australia and Europe; ii) InfiniBand extension technology supporting transcontinental distances using Obsidian's Longbow range extenders; iii) Connecting separate InfiniBand sub-nets with different net topologies to create a single computational resource: Galaxy of Supercomputers [10] iv) Running workflows and applications on such a distributed computational infrastructure. We have successfully demonstrated InfiniCortex prototypes at SC14 and SC15 conferences. The infrastructure comprised of computing resources residing at multiple locations in Singapore, Japan, Australia, USA, Canada, France and Poland. Various concurrent applications, including workflows, I/O heavy applications enabled with ADIOS system, Extempore real-time interactive applications, and in-situ realtime visualisations were demonstrated. In this paper we briefly report on basic ideas behind Infini-Cortex construct, our recent successes and some ideas about further growth and extension of this project.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130869388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yunlong Xu, Lan Gao, Rui Wang, Zhongzhi Luan, Weiguo Wu, D. Qian
{"title":"Lock-based synchronization for GPU architectures","authors":"Yunlong Xu, Lan Gao, Rui Wang, Zhongzhi Luan, Weiguo Wu, D. Qian","doi":"10.1145/2903150.2903155","DOIUrl":"https://doi.org/10.1145/2903150.2903155","url":null,"abstract":"Modern GPUs have shown promising results in accelerating compute-intensive and numerical workloads with limited data sharing. However, emerging GPU applications manifest ample amount of data sharing among concurrently executing threads. Often data sharing requires mutual exclusion mechanism to ensure data integrity in multithreaded environment. Although modern GPUs provide atomic primitives that can be leveraged to construct fine-grained locks, the existing GPU lock implementations either incur frequent concurrency bugs, or lead to extremely low hardware utilization due to the Single Instruction Multiple Threads (SIMT) execution paradigm of GPUs. To make more applications with data sharing benefit from GPU acceleration, we propose a new locking scheme for GPU architectures. The proposed locking scheme allows lock stealing within individual warps to avoid the concurrency bugs due to the SMIT execution of GPUs. Moreover, it adopts lock virtualization to reduce the memory cost of fine-grain GPU locks. To illustrate the usage and the benefit of GPU locks, we apply the proposed GPU locking scheme to Delaunay mesh refinement (DMR), an application involving massive data sharing among threads. Our lock-based implementation can achieve 1.22x speedup over an algorithmic optimization based implementation (which uses a synchronization mechanism tailored for DMR) with 94% less memory cost.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133073720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Antonios Dimitriadis, P. Efraimidis, Vasilios Katos
{"title":"Malevolent app pairs: an Android permission overpassing scheme","authors":"Antonios Dimitriadis, P. Efraimidis, Vasilios Katos","doi":"10.1145/2903150.2911706","DOIUrl":"https://doi.org/10.1145/2903150.2911706","url":null,"abstract":"Portable smart devices potentially store a wealth of information of personal data, making them attractive targets for data exfiltration attacks. Permission based schemes are core security controls for reducing privacy and security risks. In this paper we demonstrate that current permission schemes cannot effectively mitigate risks posed by covert channels. We show that a pair of apps with different permission settings may collude in order to effectively create a state where a union of their permissions is obtained, giving opportunities for leaking sensitive data, whilst keeping the leak potentially unnoticed. We then propose a solution for such attacks.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133507839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tiziana Fanni, Carlo Sau, P. Meloni, L. Raffo, F. Palumbo
{"title":"Power and clock gating modelling in coarse grained reconfigurable systems","authors":"Tiziana Fanni, Carlo Sau, P. Meloni, L. Raffo, F. Palumbo","doi":"10.1145/2903150.2911713","DOIUrl":"https://doi.org/10.1145/2903150.2911713","url":null,"abstract":"Power reduction is one of the biggest challenges in modern systems and tends to become a severe issue dealing with complex scenarios. To provide high-performance and flexibility, designers often opt for coarse-grained reconfigurable (CGR) systems. Nevertheless, these systems require specific attention to the power problem, since large set of resources may be underutilized while computing a certain task. This paper focuses on this issue. Targeting CGR devices, we propose a way to model in advance power and clock gating costs on the basis of the functional, technological and architectural parameters of the baseline CGR system. The proposed flow guides designers towards optimal implementations, saving designer effort and time.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128285629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Energy reduction in video systems: the GreenVideo project","authors":"M. Pelcat, Erwan Nogues, X. Ducloux","doi":"10.1145/2903150.2911716","DOIUrl":"https://doi.org/10.1145/2903150.2911716","url":null,"abstract":"With the current progress in microelectronics and the constant increase of network bandwidth, video applications are becoming ubiquitous and spread especially in the context of mobility. In 2019, 80% of the worldwide Internet traffic will be video. Nevertheless, optimizing the energy consumption for video processing is still a challenge due to the large amount of processed data. This talk will concentrate on the energy optimization of video codecs. In the first part, the Green Metadata initiative will be presented. In November 2014, MPEG released a new standard, named Green Metadata that fosters energy-efficient media on consumer devices. This standard specifies metadata to be transmitted between encoder and decoder for reducing power consumption during encoding, decoding and display. The different metadata considered in the standard will be presented. More specifically, the Green Adaptive Streaming proposition will be detailed. In the second part, the energy optimization of an HEVC decoder implemented on a modern MP-SoC will be presented. The different techniques used to implement efficiently an HEVC decoder on a general-purpose processor (GPP) will be detailed. Different levels of parallelism have been exploited to increase and exploit slack time. A sophisticated DVFS mechanism has been developed to handle the variability of the decoding process for each frame. To get further energy gains, the concept of approximate computing is exploited to propose a modified HEVC decoder capable of tuning its energy gains while managing the decoding quality versus energy trade-off. The work detailed in this second part of the talk is the result of the french GreenVideo FUI project.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134501493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A non von neumann continuum computer architecture for scalability beyond Moore's law","authors":"M. Brodowicz, T. Sterling","doi":"10.1145/2903150.2903486","DOIUrl":"https://doi.org/10.1145/2903150.2903486","url":null,"abstract":"A strategic challenge confronting the continued advance of high performance computing (HPC) to extreme scale is the approaching near-nanoscale semiconductor technology and the end of Moore's Law. This paper introduces the foundations of an innovative class of parallel architecture reversing many of the conventional architecture directions, but benefiting from substantial prior art of previous decades. The Continuum Computer Architecture, or CCA, eschews traditional von Neumann-derived processing logic, instead employing structures composed of fine-grain cells (fontons) that combine functional units, memory, and network. The paper describes how CCA systems of various scales may be organized and implemented using currently available technology. As programming of such systems substantially differs from established practices, a still experimental ParalleX execution model is introduced to be used as a guide for the implementation of related software stack layers, ranging from the operating system to application level constructs. Finally, the HPX-5 runtime system, an advanced implementation of ParalleX core components, is presented as an intermediate software methodology for CCA system computation resource management.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132785366","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimizing sparse matrix computations through compiler-assisted programming","authors":"K. Rietveld, H. Wijshoff","doi":"10.1145/2903150.2903157","DOIUrl":"https://doi.org/10.1145/2903150.2903157","url":null,"abstract":"Existing high-performance implementations of sparse matrix codes are intricate and result in large code bases. In fact, a single floating-point operation requires 400 to 600 lines of additional code to \"prepare\" this operation. This imbalance severely obscures code development, thereby complicating maintenance and portability. In this paper, we propose a drastically different approach in order to continue to effectively handle these codes. We propose to only specify the essence of the computation on the level of individual matrix elements. All additional source code to embed these computations are then generated and optimized automatically by the compiler. This approach is far superior to existing library approaches and allows code to perform scatter/gather operations, matrix reordering, matrix data structure handling, handling of fill-in, etc., to be generated automatically. Experiments show that very efficient data structures can be generated and the resulting codes can be very competitive.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133671812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Big data analytics and the LHC","authors":"M. Girone","doi":"10.1145/2903150.2917755","DOIUrl":"https://doi.org/10.1145/2903150.2917755","url":null,"abstract":"The Large Hadron Collider is one of the largest and most complicated pieces of scientific apparatus ever constructed. The detectors along the LHC ring see as many as 800 million proton-proton collisions per second. An event in 10 to the 11th power is new physics and there is a hierarchical series of steps to extract a tiny signal from an enormous background. High energy physics (HEP) has long been a driver in managing and processing enormous scientific datasets and the largest scale high throughput computing centers. HEP developed one of the first scientific computing grids that now regularly operates 500k processor cores and half of an exabyte of disk storage located on 5 continents including hundred of connected facilities. In this presentation I will discuss the techniques used to extract scientific discovery from a large and complicated dataset. While HEP has developed many tools and techniques for handling big datasets, there is an increasing desire within the field to make more effective use of additional industry developments. I will discuss some of the ongoing work to adopt industry techniques in big data analytics to improve the discovery potential of the LHC and the effectiveness of the scientists who work on it.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114168928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}