2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)最新文献

筛选
英文 中文
Resource-Management Study in HPC Runtime-Stacking Context HPC运行时堆栈环境下的资源管理研究
Arthur Loussert, Benoit Welterlen, Patrick Carribault, Julien Jaeger, Marc Pérache, R. Namyst
{"title":"Resource-Management Study in HPC Runtime-Stacking Context","authors":"Arthur Loussert, Benoit Welterlen, Patrick Carribault, Julien Jaeger, Marc Pérache, R. Namyst","doi":"10.1109/SBAC-PAD.2017.30","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2017.30","url":null,"abstract":"With the advent of multicore and manycore processors as building blocks of HPC supercomputers, many applications shift from relying solely on a distributed programming model (e.g., MPI) to mixing distributed and shared-memory models (e.g., MPI+OpenMP), to better exploit shared-memory communications and reduce the overall memory footprint. One side effect of this programming approach is runtime stacking: mixing multiple models involve various runtime libraries to be alive at the same time and to share the underlying computing resources. This paper explores different configurations where this stacking may appear and introduces algorithms to detect the misuse of compute resources when running a hybrid parallel application. We have implemented our algorithms inside a dynamic tool that monitors applications and outputs resource usage to the user. We validated this tool on applications from CORAL benchmarks. This leads to relevant information which can be used to improve runtime placement, and to an average overhead lower than 1% of total execution time.","PeriodicalId":187204,"journal":{"name":"2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126569861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Cloud Workload Prediction and Generation Models 云工作负载预测和生成模型
Gilles Madi-Wamba, Yunbo Li, Anne-Cécile Orgerie, Nicolas Beldiceanu, Jean-Marc Menaud
{"title":"Cloud Workload Prediction and Generation Models","authors":"Gilles Madi-Wamba, Yunbo Li, Anne-Cécile Orgerie, Nicolas Beldiceanu, Jean-Marc Menaud","doi":"10.1109/SBAC-PAD.2017.19","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2017.19","url":null,"abstract":"Cloud computing allows for elasticity as users can dynamically benefit from new virtual resources when their workload increases. Such a feature requires highly reactive resource provisioning mechanisms. In this paper, we propose two new workload prediction models, based on constraint programming and neural networks, that can be used for dynamic resource provisioning in Cloud environments. We also present two workload trace generators that can help to extend an experimental dataset in order to test more widely resource optimization heuristics. Our models are validated using real traces from a small Cloud provider. Both approaches are shown to be complimentary as neural networks give better prediction results, while constraint programming is more suitable for trace generation.","PeriodicalId":187204,"journal":{"name":"2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131582284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Beyond the Fog: Bringing Cross-Platform Code Execution to Constrained IoT Devices 超越迷雾:将跨平台代码执行带到受限的物联网设备
F. Pisani, Jeferson Rech Brunetta, Vanderson Martins do Rosário, E. Borin
{"title":"Beyond the Fog: Bringing Cross-Platform Code Execution to Constrained IoT Devices","authors":"F. Pisani, Jeferson Rech Brunetta, Vanderson Martins do Rosário, E. Borin","doi":"10.1109/SBAC-PAD.2017.10","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2017.10","url":null,"abstract":"Considering the prediction that there will be over 50 billion devices connected to the Internet of Things (IoT) in the near future, the demand for efficient ways to process data streams generated by sensors grows ever larger, highlighting the necessity to re-evaluate current approaches, such as sending all data to the cloud for processing and analysis.In this paper, we explore one of the methods for improving this scenario: bringing the computation closer to data sources. By executing the code on the IoT devices themselves instead of on the network edge or the cloud, solutions can better meet the latency requirements of several applications, avoid problems with slow and intermittent network connections, prevent network congestion, and potentially save energy by reducing communication.To this end, we propose the LMC framework and compare it with Edgent, an open-source project that is under development by the Apache Incubator. By using a DragonBoard 410c to execute a simple filter, an outlier detector, and a program that calculates the FFT, we obtained results that indicate that LMC outperforms Edgent when dynamic translation is disabled for both of them and is more suitable for lightweight quick queries otherwise. More importantly, the LMC also enables us to perform cross-platform code execution on small, cheap devices that do not have enough resources to run Edgent, like the NodeMCU 1.0.","PeriodicalId":187204,"journal":{"name":"2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116869302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
GC-CR: A Decentralized Garbage Collector Component for Checkpointing in Clouds GC-CR:用于云中检查点的分散式垃圾收集器组件
Thouraya Louati, Heithem Abbes, C. Cérin, M. Jemni
{"title":"GC-CR: A Decentralized Garbage Collector Component for Checkpointing in Clouds","authors":"Thouraya Louati, Heithem Abbes, C. Cérin, M. Jemni","doi":"10.1109/SBAC-PAD.2017.20","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2017.20","url":null,"abstract":"Infrastructure-as-a-Service container-based virtualization technology is gaining significant interest in industry as an alternative platform for running distributed applications. With increasing scale of Cloud Computing architectures, faults are becoming a frequent occurrence. Checkpoint-Restart is a key method to survive to failures in this context. However, there is a need to reduce the amount of checkpointing data as the Cloud is based on the pay-as-you-go model. This paper addresses the issue of garbage collection in LXCloud-CR and contributes with a novel decentralized garbage collection component GC-CR. LXCloud-CR, a decentralized Checkpoint-Restart model, is able to take snapshots of Linux Container instances and it uses replication to increase snapshots availability. LXCloud-CR contains a versioning scheme for each replica. The disadvantage refers to snapshots availability issues with versioning as the number of useless files grows. GC-CR is a decentralized garbage collector (checkpoint deletion) component that attempts to identify and eliminate old snapshots versions from the system in order to free storage space. Large scale experiments on the Grid5000 testbed demonstrate the benefits of our proposal. Obtained results validate our model and show significant reduction of storage space consumption","PeriodicalId":187204,"journal":{"name":"2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","volume":"152 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114378165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Scalability of CPU and GPU Solutions of the Prime Elliptic Curve Discrete Logarithm Problem 素数椭圆曲线离散对数问题的CPU和GPU解的可扩展性
J. Panetta, P. S. Filho, Luiz A. F. Laranjeira, Carlos A. Teixeira
{"title":"Scalability of CPU and GPU Solutions of the Prime Elliptic Curve Discrete Logarithm Problem","authors":"J. Panetta, P. S. Filho, Luiz A. F. Laranjeira, Carlos A. Teixeira","doi":"10.1109/SBAC-PAD.2017.12","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2017.12","url":null,"abstract":"Elliptic curve asymmetric cryptography has achieved increased popularity due to its capability of providing comparable levels of security as other existing cryptographic systems while requiring less computational work. Pollard Rho and Parallel Collision Search, the fastest known sequential and parallel algorithms for breaking this cryptographic system, have been successfully applied over time to break ever-increasing bit-length system instances using implementations heavily optimized for the available hardware. This work presents portable, general implementations of a Parallel Collision Search based solution for prime elliptic curve asymmetric cryptographic systems that use publicly available big integer libraries and make no assumption on prime curve properties. It investigates which bit-length keys can be broken in reasonable time by a user that has access to a state of the art, public HPC equipment with CPUs and GPUs. The final implementation breaks a 79-bit system in about two hours using 80 GPUs and 94-bits system in about 15 hours using 256 GPUs. Extensive experimentation investigates scalability of CPU, GPU and CPU+GPU runs. The discussed results indicate that speed-up is not a good metric for parallel scalability. This paper proposes and evaluates a new metric that is better suited for this task.","PeriodicalId":187204,"journal":{"name":"2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124843895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Extending OmpSs for OpenCL Kernel Co-Execution in Heterogeneous Systems 异构系统中OpenCL内核协同执行的扩展
Borja Pérez, Esteban Stafford, J. L. Bosque, R. Beivide, Sergi Mateo, Xavier Teruel, X. Martorell, E. Ayguadé
{"title":"Extending OmpSs for OpenCL Kernel Co-Execution in Heterogeneous Systems","authors":"Borja Pérez, Esteban Stafford, J. L. Bosque, R. Beivide, Sergi Mateo, Xavier Teruel, X. Martorell, E. Ayguadé","doi":"10.1109/SBAC-PAD.2017.8","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2017.8","url":null,"abstract":"Heterogeneous systems have a very high potential performance but present difficulties in their programming. OmpSs is a well known framework for task based parallel applications, which is an interesting tool to simplify the programming of these systems. However, it does not support the co-execution of a single OpenCL kernel instance on several compute devices. To overcome this limitation, this paper presents an extension of the OmpSs framework that solves two main objectives: the automatic division of datasets among several devices and the management of their memory address spaces. To adapt to different kinds of applications, the data division can be performed by the novel HGuided load balancing algorithm or by the well known Static and Dynamic. All this is accomplished with negligible impact on the programming. Experimental results reveal that there is always one load balancing algorithm that improves the performance and energy consumption of the system.","PeriodicalId":187204,"journal":{"name":"2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129654589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Exploiting Data Compression to Mitigate Aging in GPU Register Files 利用数据压缩缓解GPU寄存器文件老化
F. Candel, A. Valero, S. Petit, D. S. Gracia, J. Sahuquillo
{"title":"Exploiting Data Compression to Mitigate Aging in GPU Register Files","authors":"F. Candel, A. Valero, S. Petit, D. S. Gracia, J. Sahuquillo","doi":"10.1109/SBAC-PAD.2017.15","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2017.15","url":null,"abstract":"Nowadays, GPUs sit at the forefront of highperformance computing thanks to their massive computational capabilities. Internally, thousands of functional units, architected to be fed by large register files, fuel such a performance.At nanometer technologies, the SRAM cells that implement register files suffer the Negative Bias Temperature Instability (NBTI) effect, which degrades the transistor threshold voltage Vth and, in turn, can make cells faulty unreliable when they hold the same logic value for long periods of time.Fortunately, the GPU single-thread multiple-data execution model writes data in recognizable patterns. This work proposes mechanisms to detect those patterns, and to compress and shuffle the data, so that compressed register file entries can be safely powered off, mitigating NBTI aging.Experimental results show that a conventional GPU register file experiences the worst case for NBTI, since maintains cells with a single logic value during the entire application execution (i.e., a 100% 0 and 1 duty cycle distributions). On average, the proposal reduces these distributions by 61% and 72%, respectively, which translates into Vth degradation savings by 57% and 64%, respectively.","PeriodicalId":187204,"journal":{"name":"2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130829937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Addressing Energy Challenges in Filter Caches 解决过滤器缓存中的能源挑战
Ricardo Alves, Nikos Nikoleris, S. Kaxiras, D. Black-Schaffer
{"title":"Addressing Energy Challenges in Filter Caches","authors":"Ricardo Alves, Nikos Nikoleris, S. Kaxiras, D. Black-Schaffer","doi":"10.1109/SBAC-PAD.2017.14","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2017.14","url":null,"abstract":"Filter caches and way-predictors are common approaches to improve the efficiency and/or performance of first-level caches. Filter caches use a small L0 to provide more efficient and faster access to a small subset of the data, and work well for programs with high locality. Way-predictors improve efficiency by accessing only the way predicted, which alleviates the need to read all ways in parallel without increasing latency, but hurts performance due to mispredictions.In this work we examine how SRAM layout constraints (h-trees and data mapping inside the cache) affect way-predictors and filter caches. We show that accessing the smaller L0 array can be significantly more energy efficient than attempting to read fewer ways from a larger L1 cache; and that the main source of energy inefficiency in filter caches comes from L0 and L1 misses. We propose a filter cache optimization that shares the tag array between the L0 and the L1, which incurs the overhead of reading the larger tag array on every access, but in return allows us to directly access the correct L1 way on each L0 miss. This optimization does not add any extra latency and counter-intuitively, improves the filter caches overall energy efficiency beyond that of the way-predictor.By combining the low power benefits of a physically smaller L0 with the reduction in miss energy by reading L1 tags upfront in parallel with L0 data, we show that the optimized filter cache reduces the dynamic cache energy compared to a traditional filter cache by 26% while providing the same performance advantage. Compared to a way-predictor, the optimized cache improves performance by 6% and energy by 2%.","PeriodicalId":187204,"journal":{"name":"2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130163401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Exploring Heterogeneous Mobile Architectures with a High-Level Programming Model 用高级编程模型探索异构移动架构
W. D. C. Moreira, Guilherme Andrade, Pedro Caldeira, Renato Utsch Goncalves, R. Ferreira, L. Rocha, Renan de Carvalho Sousa, Millas Nasser Ramsses Avelar
{"title":"Exploring Heterogeneous Mobile Architectures with a High-Level Programming Model","authors":"W. D. C. Moreira, Guilherme Andrade, Pedro Caldeira, Renato Utsch Goncalves, R. Ferreira, L. Rocha, Renan de Carvalho Sousa, Millas Nasser Ramsses Avelar","doi":"10.1109/SBAC-PAD.2017.11","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2017.11","url":null,"abstract":"The development of new technologies is setting a new era characterized, among other factors, by the rise of sophisticated mobile devices containing CPUs and GPUs. This emerging scenario of heterogeneous mobile architectures brings challenging issues regarding the use of the available computing resources. Such issues are mainly related to the intrinsic complexity of coordinating these processors in order to increase application performance. In this sense, this paper presents a high-level programming model to implement parallel patterns that can be executed in a coordinate way by heterogeneous mobile architectures. A comparative analysis of performance and programming complexity is presented, contrasting code generated automatically from the proposed programming model with low-level manually-optimized implementations.","PeriodicalId":187204,"journal":{"name":"2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126424520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Online Multimedia Similarity Search with Response Time-Aware Parallelism and Task Granularity Auto-Tuning 具有响应时间感知并行性和任务粒度自动调优的在线多媒体相似度搜索
Guilherme Andrade, George Teodoro, R. Ferreira
{"title":"Online Multimedia Similarity Search with Response Time-Aware Parallelism and Task Granularity Auto-Tuning","authors":"Guilherme Andrade, George Teodoro, R. Ferreira","doi":"10.1109/SBAC-PAD.2017.27","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2017.27","url":null,"abstract":"This paper presents an efficient parallel implementation of the Product Quantization based approximate nearest neighbor multimedia similarity search indexing (PQANNS). The parallel PQANNS efficiently answers nearest neighbor queries by exploiting the ability of the quantization approach to reduce the data dimensionality (and memory demand) and by leveraging parallelism to speed up the search capabilities of the application. Our solution is also optimized to minimize query response times under scenarios with fluctuating query rates (load) as observed in online services. To achieve this goal, we have developed strategies to dynamically select the parallelism configuration and task granularity that minimizes the query response times during the execution. The proposed strategies (ADAPT and ADAPT+G) were thoroughly evaluated and have shown, for instance, to reduce the query response times in 6.4x as compared to the best static configuration of parallelism and task granularity.","PeriodicalId":187204,"journal":{"name":"2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130969413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信