Proceedings of the 12th ACM International Conference on Computing Frontiers最新文献_第4页

An energy-efficient custom architecture for the SKA1-low central signal processor ska1低中央信号处理器的节能定制架构

Proceedings of the 12th ACM International Conference on Computing Frontiers Pub Date : 2015-05-06 DOI: 10.1145/2742854.2742855

Leandro Fiorin, E. Vermij, J. V. Lunteren, R. Jongerius, C. Hagleitner

引用次数: 9

Scaling application properties to exascale 将应用程序属性缩放到百亿亿级

Proceedings of the 12th ACM International Conference on Computing Frontiers Pub Date : 2015-05-06 DOI: 10.1145/2742854.2742860

Giovanni Mariani, Andreea Anghel, R. Jongerius, G. Dittmann

{"title":"Scaling application properties to exascale","authors":"Giovanni Mariani, Andreea Anghel, R. Jongerius, G. Dittmann","doi":"10.1145/2742854.2742860","DOIUrl":"https://doi.org/10.1145/2742854.2742860","url":null,"abstract":"Exascale computing systems will execute computationally intensive tasks on unprecedented amounts of data. Tuning the design of such systems for a specific application or for an application domain is a challenging task as it is not yet possible to analyze the actual run-time behavior of exascale applications. Run-time properties, such as the memory access pattern, the available instruction-level parallelism and the instruction mix, are valuable information for architects to tune the processing elements, the memory system and the communication infrastructure. We propose a methodology for extrapolating application properties at exascale from an analysis of workload sizes feasible on current systems. The methodology is suitable for applications scaling over different parameters (e.g., the number of vertices and edges represent two parameters in a graph algorithm). The proposed methodology combines a) a statistically sound approach for model selection and b) knowledge coming from computational theory, such as the order of complexity of the application under analysis. Compared with state-of-the-art techniques, the proposed methodology reduces the prediction error by an order of magnitude on the instruction count and improves the accuracy of memory access pattern prediction by up to 1.3×.","PeriodicalId":417279,"journal":{"name":"Proceedings of the 12th ACM International Conference on Computing Frontiers","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127809484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Optimizing the accuracy of a rocket trajectory simulation by program transformation 利用程序变换优化火箭弹道仿真精度

Proceedings of the 12th ACM International Conference on Computing Frontiers Pub Date : 2015-05-06 DOI: 10.1145/2742854.2742894

Nasrine Damouche, M. Martel, Alexandre Chapoutot

引用次数: 7

Green adaptive streaming 绿色自适应流

Proceedings of the 12th ACM International Conference on Computing Frontiers Pub Date : 2015-05-06 DOI: 10.1145/2742854.2747289

X. Ducloux

引用次数: 1

Programmer-directed partial redundancy for resilient HPC 面向弹性高性能计算的程序员定向部分冗余

Proceedings of the 12th ACM International Conference on Computing Frontiers Pub Date : 2015-05-06 DOI: 10.1145/2742854.2742903

Omer Subasi, J. Moreno, O. Unsal, Jesús Labarta, A. Cristal

引用次数: 24

Exploring multi-banked shared-L1 program cache on ultra-low power, tightly coupled processor clusters 探索超低功耗、紧耦合处理器集群上的多银行共享l1程序缓存

Proceedings of the 12th ACM International Conference on Computing Frontiers Pub Date : 2015-05-06 DOI: 10.1145/2742854.2747288

Igor Loi, D. Rossi, Germain Haugou, Michael Gautschi, L. Benini

{"title":"Exploring multi-banked shared-L1 program cache on ultra-low power, tightly coupled processor clusters","authors":"Igor Loi, D. Rossi, Germain Haugou, Michael Gautschi, L. Benini","doi":"10.1145/2742854.2747288","DOIUrl":"https://doi.org/10.1145/2742854.2747288","url":null,"abstract":"L1 instruction caches in many-core systems represent a sizable fraction of the total power consumption. Although large instruction caches can significantly improve performance, they have the potential to increase power consumption. Private caches are usually able to achieve higher speed, due to their simpler design, but the smaller L1 memory space seen by each core induces a high miss ratio. Shared instruction cache can be seen as an attractive solution to improve performance and energy efficiency while reducing area. In this paper we propose a multi-banked, shared instruction cache architecture suitable for ultra-low power multicore systems, where parallelism and near threshold operation is used to achieve minimum energy. We implemented the cluster architecture with different configurations of cache sharing, utilizing the 28nm UTBB FD-SOI from STMicroelectronics as reference technology. Experimental results, based on several real-life applications, demonstrate that sharing mechanisms have no impact on the system operating frequency, and allow to reduce the energy consumption of the cache subsystem by up to 10%, while keeping the same area footprint, or reducing by 2× the overall shared cache area, while keeping the same performance and energy efficiency with respect to a cluster of processing elements with private program caches.","PeriodicalId":417279,"journal":{"name":"Proceedings of the 12th ACM International Conference on Computing Frontiers","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128904362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

Just-in-time component-wise power and thermal modeling 实时组件的功率和热建模

Proceedings of the 12th ACM International Conference on Computing Frontiers Pub Date : 2015-05-06 DOI: 10.1145/2742854.2742880

S. Rahman, Qing Yi, H. Homayoun

引用次数: 1

Position-aware thread-level speculative parallelization for large-scale chip-multiprocessor 面向大规模芯片多处理器的位置感知线程级推测并行化

Proceedings of the 12th ACM International Conference on Computing Frontiers Pub Date : 2015-05-06 DOI: 10.1145/2742854.2742866

L. Yanhua, Zhang Youhui, Zheng Weimin

{"title":"Position-aware thread-level speculative parallelization for large-scale chip-multiprocessor","authors":"L. Yanhua, Zhang Youhui, Zheng Weimin","doi":"10.1145/2742854.2742866","DOIUrl":"https://doi.org/10.1145/2742854.2742866","url":null,"abstract":"Thread-Level Speculation (TLS) is an effective mechanism for exploiting automatic parallelization of the sequential programs, especially for the large scale chip multiprocessor (CMP) which is rich of idle computation resources on chip. TLS could use the idle computation resources to improve the performance of sequential program. However, the inter-thread correlation between the speculative threads requests more careful core assignment and thread scheduling for the TLS execution, rather than the conventional threads. Analysis shows that there is a high correlation between TLS execution performance and the on-chip \"position\" of the cores assigned for the TLS execution. Accordingly, we propose a \"position-aware\" task scheduling strategy for the thread-level speculative parallelization. We introduce a model to evaluate the \"Centre of Data Gravity (CDG)\" of the TLS program, and propose a new core assignment and thread scheduling mechanism based on CDG for the TLS execution. Tests show that, these strategies have achieved significant performance improvement: compared with the original TLS that does not consider the factor, the range of performance improvement is from 4.6% to 39%.","PeriodicalId":417279,"journal":{"name":"Proceedings of the 12th ACM International Conference on Computing Frontiers","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126704606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ETD-Cache: an expiration-time driven cache scheme to make SSD-based read cache endurable and cost-efficient ETD-Cache:一种过期时间驱动的缓存方案，使基于ssd的读缓存持久且经济高效

Proceedings of the 12th ACM International Conference on Computing Frontiers Pub Date : 2015-05-06 DOI: 10.1145/2742854.2742881

Ningwei Dai, Yunpeng Chai, Yushi Liang, Chunling Wang

{"title":"ETD-Cache: an expiration-time driven cache scheme to make SSD-based read cache endurable and cost-efficient","authors":"Ningwei Dai, Yunpeng Chai, Yushi Liang, Chunling Wang","doi":"10.1145/2742854.2742881","DOIUrl":"https://doi.org/10.1145/2742854.2742881","url":null,"abstract":"Recently flash-based solid-state drives (SSDs) have been widely deployed as cache devices to boost system performance. However, classical SSD cache algorithms (e.g. LRU) replace the cached data frequently to maintain high hit rates. Such aggressive data updating strategies result in too many writing operations on SSDs and make them wear out quickly, which finally leads to high costs of SSDs for enterprise applications. In this paper, we propose a novel Expiration-Time Driven Cache (ETD-Cache) method to solve this problem. In ETD-Cache, an active data eviction mechanism is adopted. An already cached block leaves the SSD cache if and only if there is no access to it for a time longer than a specified expiration time. This mechanism gives more time for the cached contents to wait for their following accesses and limits the admission of newly arrived blocks to generate less SSD writes. In addition, a low-overhead candidate management module is designed to maintain the most popular data in the system for the potential cache replacement. The simulations driven by a series of typical real-world traces indicate that due to the great reduction on data updating frequency, ETD-Cache lowers the total SSD costs by 98.45% compared with LRU under the same cache hit rate.","PeriodicalId":417279,"journal":{"name":"Proceedings of the 12th ACM International Conference on Computing Frontiers","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131180192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Near threshold cloud processors for dark silicon mitigation: the impact on emerging scale-out workloads 用于暗硅缓解的接近阈值的云处理器:对新兴横向扩展工作负载的影响

Proceedings of the 12th ACM International Conference on Computing Frontiers Pub Date : 2015-05-06 DOI: 10.1145/2742854.2742878

Jing Wang, Junwei Zhang, Wei-gong Zhang, Keni Qiu, Tao Li, Minhua Wu

{"title":"Near threshold cloud processors for dark silicon mitigation: the impact on emerging scale-out workloads","authors":"Jing Wang, Junwei Zhang, Wei-gong Zhang, Keni Qiu, Tao Li, Minhua Wu","doi":"10.1145/2742854.2742878","DOIUrl":"https://doi.org/10.1145/2742854.2742878","url":null,"abstract":"The breakdown of Dennard scaling has made computing energy limited and therefore restricts the performance and brings rise to dark silicon. To effectively leverage the advantage of increased number of transistors and alleviate the dark silicon problem, designers consider a set of design paradigms in the processor manufacturing. Among those, Near - Threshold Voltage Computing (NTC) is a promising candidate. However, prior efforts largely focus on a specific design option based on legacy desktop applications, lacking comprehensive analysis of emerging scale-out applications with multiple design options. In this paper, we characterize different perspectives including performance and energy efficiency in the context of NTC cloud processors by running emerging scale-out workloads. We find NTC can improve performance by 1.6X, and improve energy efficiency by 50%. Meanwhile, we also show that tiled-OoO architecture improve performance of scale-out workloads upto 3.7X and energy efficiency upto 6X over alternative chip organizations, making it a preferable design paradigm for scale-out workloads. We believe that our observations will provide insights for the design of cloud processors in the era of dark silicon.","PeriodicalId":417279,"journal":{"name":"Proceedings of the 12th ACM International Conference on Computing Frontiers","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121175622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3