Proceedings of the 12th ACM International Conference on Computing Frontiers最新文献

筛选
英文 中文
An energy-efficient custom architecture for the SKA1-low central signal processor ska1低中央信号处理器的节能定制架构
Proceedings of the 12th ACM International Conference on Computing Frontiers Pub Date : 2015-05-06 DOI: 10.1145/2742854.2742855
Leandro Fiorin, E. Vermij, J. V. Lunteren, R. Jongerius, C. Hagleitner
{"title":"An energy-efficient custom architecture for the SKA1-low central signal processor","authors":"Leandro Fiorin, E. Vermij, J. V. Lunteren, R. Jongerius, C. Hagleitner","doi":"10.1145/2742854.2742855","DOIUrl":"https://doi.org/10.1145/2742854.2742855","url":null,"abstract":"The Square Kilometre Array (SKA) will be the biggest radio telescope ever built, with unprecedented sensitivity, angular resolution, and survey speed. This paper explores the design of a custom architecture for the central signal processor (CSP) of the SKA1-Low, the SKA's aperture-array instrument consisting of 131,072 antennas. The SKA1-Low's antennas receive signals between 50 and 350 MHz. After digitization and preliminary processing, samples are moved to the CSP for further processing. In this work, we describe the challenges in building the CSP, and present a first quantitative study for the implementation of a custom hardware architecture for processing the main CSP algorithms. By taking advantage of emerging 3D-stacked-memory devices and by exploring the design space for a 14-nm implementation, we estimate a power consumption of 14.4 W for processing all channels of a sub-band and an energy efficiency at application level of up to 208 GFLOPS/W for our architecture.","PeriodicalId":417279,"journal":{"name":"Proceedings of the 12th ACM International Conference on Computing Frontiers","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127264647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Scaling application properties to exascale 将应用程序属性缩放到百亿亿级
Proceedings of the 12th ACM International Conference on Computing Frontiers Pub Date : 2015-05-06 DOI: 10.1145/2742854.2742860
Giovanni Mariani, Andreea Anghel, R. Jongerius, G. Dittmann
{"title":"Scaling application properties to exascale","authors":"Giovanni Mariani, Andreea Anghel, R. Jongerius, G. Dittmann","doi":"10.1145/2742854.2742860","DOIUrl":"https://doi.org/10.1145/2742854.2742860","url":null,"abstract":"Exascale computing systems will execute computationally intensive tasks on unprecedented amounts of data. Tuning the design of such systems for a specific application or for an application domain is a challenging task as it is not yet possible to analyze the actual run-time behavior of exascale applications. Run-time properties, such as the memory access pattern, the available instruction-level parallelism and the instruction mix, are valuable information for architects to tune the processing elements, the memory system and the communication infrastructure. We propose a methodology for extrapolating application properties at exascale from an analysis of workload sizes feasible on current systems. The methodology is suitable for applications scaling over different parameters (e.g., the number of vertices and edges represent two parameters in a graph algorithm). The proposed methodology combines a) a statistically sound approach for model selection and b) knowledge coming from computational theory, such as the order of complexity of the application under analysis. Compared with state-of-the-art techniques, the proposed methodology reduces the prediction error by an order of magnitude on the instruction count and improves the accuracy of memory access pattern prediction by up to 1.3×.","PeriodicalId":417279,"journal":{"name":"Proceedings of the 12th ACM International Conference on Computing Frontiers","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127809484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Optimizing the accuracy of a rocket trajectory simulation by program transformation 利用程序变换优化火箭弹道仿真精度
Proceedings of the 12th ACM International Conference on Computing Frontiers Pub Date : 2015-05-06 DOI: 10.1145/2742854.2742894
Nasrine Damouche, M. Martel, Alexandre Chapoutot
{"title":"Optimizing the accuracy of a rocket trajectory simulation by program transformation","authors":"Nasrine Damouche, M. Martel, Alexandre Chapoutot","doi":"10.1145/2742854.2742894","DOIUrl":"https://doi.org/10.1145/2742854.2742894","url":null,"abstract":"Static analysis by abstract interpretation is one of the most successful techniques used to over-approximate the roundoff errors in numerical programs. In our case, we are interested in using this method to improve the accuracy of programs which perform floating-point computations, known for their sensitivity to the way formulas are written. We are interested in transforming automatically pieces of code by applying to them several rewriting rules. In this article, we demonstrate the effectiveness of our approach on a non-trivial numerical simulation code.","PeriodicalId":417279,"journal":{"name":"Proceedings of the 12th ACM International Conference on Computing Frontiers","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125339428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Green adaptive streaming 绿色自适应流
Proceedings of the 12th ACM International Conference on Computing Frontiers Pub Date : 2015-05-06 DOI: 10.1145/2742854.2747289
X. Ducloux
{"title":"Green adaptive streaming","authors":"X. Ducloux","doi":"10.1145/2742854.2747289","DOIUrl":"https://doi.org/10.1145/2742854.2747289","url":null,"abstract":"In November 2014, MPEG released a new standard, named Green Metadata, that enables energy-efficient media consumption on consumer devices [4]. This standard specifies metadata for reduction of power consumption during encoding, decoding and display. It specifies in particular some metadata for energy-efficient media selection using DASH standard [3]. This paper describes the concept of Green Adaptive Streaming and presents the demonstration which has been prepared as a proof of concept.","PeriodicalId":417279,"journal":{"name":"Proceedings of the 12th ACM International Conference on Computing Frontiers","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123063423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Programmer-directed partial redundancy for resilient HPC 面向弹性高性能计算的程序员定向部分冗余
Proceedings of the 12th ACM International Conference on Computing Frontiers Pub Date : 2015-05-06 DOI: 10.1145/2742854.2742903
Omer Subasi, J. Moreno, O. Unsal, Jesús Labarta, A. Cristal
{"title":"Programmer-directed partial redundancy for resilient HPC","authors":"Omer Subasi, J. Moreno, O. Unsal, Jesús Labarta, A. Cristal","doi":"10.1145/2742854.2742903","DOIUrl":"https://doi.org/10.1145/2742854.2742903","url":null,"abstract":"In this work we propose partial task replication and checkpointing for task-parallel HPC applications to mitigate silent data corruption (SDC) errors. As the complete replication of all application tasks can be prohibitive due to resource costs, we introduce programmer-directed selective replication mechanism to provide fault-tolerance while decreasing costs. Results show that our scheme detects and corrects around 65% of SDC errors with only 4% overhead on average.","PeriodicalId":417279,"journal":{"name":"Proceedings of the 12th ACM International Conference on Computing Frontiers","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115252351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Exploring multi-banked shared-L1 program cache on ultra-low power, tightly coupled processor clusters 探索超低功耗、紧耦合处理器集群上的多银行共享l1程序缓存
Proceedings of the 12th ACM International Conference on Computing Frontiers Pub Date : 2015-05-06 DOI: 10.1145/2742854.2747288
Igor Loi, D. Rossi, Germain Haugou, Michael Gautschi, L. Benini
{"title":"Exploring multi-banked shared-L1 program cache on ultra-low power, tightly coupled processor clusters","authors":"Igor Loi, D. Rossi, Germain Haugou, Michael Gautschi, L. Benini","doi":"10.1145/2742854.2747288","DOIUrl":"https://doi.org/10.1145/2742854.2747288","url":null,"abstract":"L1 instruction caches in many-core systems represent a sizable fraction of the total power consumption. Although large instruction caches can significantly improve performance, they have the potential to increase power consumption. Private caches are usually able to achieve higher speed, due to their simpler design, but the smaller L1 memory space seen by each core induces a high miss ratio. Shared instruction cache can be seen as an attractive solution to improve performance and energy efficiency while reducing area. In this paper we propose a multi-banked, shared instruction cache architecture suitable for ultra-low power multicore systems, where parallelism and near threshold operation is used to achieve minimum energy. We implemented the cluster architecture with different configurations of cache sharing, utilizing the 28nm UTBB FD-SOI from STMicroelectronics as reference technology. Experimental results, based on several real-life applications, demonstrate that sharing mechanisms have no impact on the system operating frequency, and allow to reduce the energy consumption of the cache subsystem by up to 10%, while keeping the same area footprint, or reducing by 2× the overall shared cache area, while keeping the same performance and energy efficiency with respect to a cluster of processing elements with private program caches.","PeriodicalId":417279,"journal":{"name":"Proceedings of the 12th ACM International Conference on Computing Frontiers","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128904362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Just-in-time component-wise power and thermal modeling 实时组件的功率和热建模
Proceedings of the 12th ACM International Conference on Computing Frontiers Pub Date : 2015-05-06 DOI: 10.1145/2742854.2742880
S. Rahman, Qing Yi, H. Homayoun
{"title":"Just-in-time component-wise power and thermal modeling","authors":"S. Rahman, Qing Yi, H. Homayoun","doi":"10.1145/2742854.2742880","DOIUrl":"https://doi.org/10.1145/2742854.2742880","url":null,"abstract":"As computer systems increasingly focus on balancing the performance and power efficiency of software applications together with temperature variations of the machine, they need to understand how software applications utilize the various architecture components differently. This paper develops a power and temperature modeling framework to provide such timely feedback, which can then be used to support a dynamic optimization system to attain better energy efficiency for applications. In particular, we present a framework that combines McPAT [17], a cycle accurate architecture simulation model, with runtime hardware performance counter statistics, to attain component-wise power consumption breakdown of applications while running at GHz speed. Our framework is able to consistently achieve 98% accuracy when compared to the actual system-level power consumption measured using a real-time power meter [1]. Finally, we present a preliminary study to demonstrate the potential of using our framework to support the optimizations of applications for better energy efficiency.","PeriodicalId":417279,"journal":{"name":"Proceedings of the 12th ACM International Conference on Computing Frontiers","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121746245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Position-aware thread-level speculative parallelization for large-scale chip-multiprocessor 面向大规模芯片多处理器的位置感知线程级推测并行化
Proceedings of the 12th ACM International Conference on Computing Frontiers Pub Date : 2015-05-06 DOI: 10.1145/2742854.2742866
L. Yanhua, Zhang Youhui, Zheng Weimin
{"title":"Position-aware thread-level speculative parallelization for large-scale chip-multiprocessor","authors":"L. Yanhua, Zhang Youhui, Zheng Weimin","doi":"10.1145/2742854.2742866","DOIUrl":"https://doi.org/10.1145/2742854.2742866","url":null,"abstract":"Thread-Level Speculation (TLS) is an effective mechanism for exploiting automatic parallelization of the sequential programs, especially for the large scale chip multiprocessor (CMP) which is rich of idle computation resources on chip. TLS could use the idle computation resources to improve the performance of sequential program. However, the inter-thread correlation between the speculative threads requests more careful core assignment and thread scheduling for the TLS execution, rather than the conventional threads. Analysis shows that there is a high correlation between TLS execution performance and the on-chip \"position\" of the cores assigned for the TLS execution. Accordingly, we propose a \"position-aware\" task scheduling strategy for the thread-level speculative parallelization. We introduce a model to evaluate the \"Centre of Data Gravity (CDG)\" of the TLS program, and propose a new core assignment and thread scheduling mechanism based on CDG for the TLS execution. Tests show that, these strategies have achieved significant performance improvement: compared with the original TLS that does not consider the factor, the range of performance improvement is from 4.6% to 39%.","PeriodicalId":417279,"journal":{"name":"Proceedings of the 12th ACM International Conference on Computing Frontiers","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126704606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ETD-Cache: an expiration-time driven cache scheme to make SSD-based read cache endurable and cost-efficient ETD-Cache:一种过期时间驱动的缓存方案,使基于ssd的读缓存持久且经济高效
Proceedings of the 12th ACM International Conference on Computing Frontiers Pub Date : 2015-05-06 DOI: 10.1145/2742854.2742881
Ningwei Dai, Yunpeng Chai, Yushi Liang, Chunling Wang
{"title":"ETD-Cache: an expiration-time driven cache scheme to make SSD-based read cache endurable and cost-efficient","authors":"Ningwei Dai, Yunpeng Chai, Yushi Liang, Chunling Wang","doi":"10.1145/2742854.2742881","DOIUrl":"https://doi.org/10.1145/2742854.2742881","url":null,"abstract":"Recently flash-based solid-state drives (SSDs) have been widely deployed as cache devices to boost system performance. However, classical SSD cache algorithms (e.g. LRU) replace the cached data frequently to maintain high hit rates. Such aggressive data updating strategies result in too many writing operations on SSDs and make them wear out quickly, which finally leads to high costs of SSDs for enterprise applications. In this paper, we propose a novel Expiration-Time Driven Cache (ETD-Cache) method to solve this problem. In ETD-Cache, an active data eviction mechanism is adopted. An already cached block leaves the SSD cache if and only if there is no access to it for a time longer than a specified expiration time. This mechanism gives more time for the cached contents to wait for their following accesses and limits the admission of newly arrived blocks to generate less SSD writes. In addition, a low-overhead candidate management module is designed to maintain the most popular data in the system for the potential cache replacement. The simulations driven by a series of typical real-world traces indicate that due to the great reduction on data updating frequency, ETD-Cache lowers the total SSD costs by 98.45% compared with LRU under the same cache hit rate.","PeriodicalId":417279,"journal":{"name":"Proceedings of the 12th ACM International Conference on Computing Frontiers","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131180192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Near threshold cloud processors for dark silicon mitigation: the impact on emerging scale-out workloads 用于暗硅缓解的接近阈值的云处理器:对新兴横向扩展工作负载的影响
Proceedings of the 12th ACM International Conference on Computing Frontiers Pub Date : 2015-05-06 DOI: 10.1145/2742854.2742878
Jing Wang, Junwei Zhang, Wei-gong Zhang, Keni Qiu, Tao Li, Minhua Wu
{"title":"Near threshold cloud processors for dark silicon mitigation: the impact on emerging scale-out workloads","authors":"Jing Wang, Junwei Zhang, Wei-gong Zhang, Keni Qiu, Tao Li, Minhua Wu","doi":"10.1145/2742854.2742878","DOIUrl":"https://doi.org/10.1145/2742854.2742878","url":null,"abstract":"The breakdown of Dennard scaling has made computing energy limited and therefore restricts the performance and brings rise to dark silicon. To effectively leverage the advantage of increased number of transistors and alleviate the dark silicon problem, designers consider a set of design paradigms in the processor manufacturing. Among those, Near - Threshold Voltage Computing (NTC) is a promising candidate. However, prior efforts largely focus on a specific design option based on legacy desktop applications, lacking comprehensive analysis of emerging scale-out applications with multiple design options. In this paper, we characterize different perspectives including performance and energy efficiency in the context of NTC cloud processors by running emerging scale-out workloads. We find NTC can improve performance by 1.6X, and improve energy efficiency by 50%. Meanwhile, we also show that tiled-OoO architecture improve performance of scale-out workloads upto 3.7X and energy efficiency upto 6X over alternative chip organizations, making it a preferable design paradigm for scale-out workloads. We believe that our observations will provide insights for the design of cloud processors in the era of dark silicon.","PeriodicalId":417279,"journal":{"name":"Proceedings of the 12th ACM International Conference on Computing Frontiers","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121175622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信