2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)最新文献

Resource-Management Study in HPC Runtime-Stacking Context HPC运行时堆栈环境下的资源管理研究

2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) Pub Date : 2017-10-27 DOI: 10.1109/SBAC-PAD.2017.30

Arthur Loussert, Benoit Welterlen, Patrick Carribault, Julien Jaeger, Marc Pérache, R. Namyst

{"title":"Resource-Management Study in HPC Runtime-Stacking Context","authors":"Arthur Loussert, Benoit Welterlen, Patrick Carribault, Julien Jaeger, Marc Pérache, R. Namyst","doi":"10.1109/SBAC-PAD.2017.30","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2017.30","url":null,"abstract":"With the advent of multicore and manycore processors as building blocks of HPC supercomputers, many applications shift from relying solely on a distributed programming model (e.g., MPI) to mixing distributed and shared-memory models (e.g., MPI+OpenMP), to better exploit shared-memory communications and reduce the overall memory footprint. One side effect of this programming approach is runtime stacking: mixing multiple models involve various runtime libraries to be alive at the same time and to share the underlying computing resources. This paper explores different configurations where this stacking may appear and introduces algorithms to detect the misuse of compute resources when running a hybrid parallel application. We have implemented our algorithms inside a dynamic tool that monitors applications and outputs resource usage to the user. We validated this tool on applications from CORAL benchmarks. This leads to relevant information which can be used to improve runtime placement, and to an average overhead lower than 1% of total execution time.","PeriodicalId":187204,"journal":{"name":"2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126569861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Cloud Workload Prediction and Generation Models 云工作负载预测和生成模型

2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) Pub Date : 2017-10-17 DOI: 10.1109/SBAC-PAD.2017.19

Gilles Madi-Wamba, Yunbo Li, Anne-Cécile Orgerie, Nicolas Beldiceanu, Jean-Marc Menaud

引用次数: 24

Beyond the Fog: Bringing Cross-Platform Code Execution to Constrained IoT Devices 超越迷雾:将跨平台代码执行带到受限的物联网设备

2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) Pub Date : 2017-10-01 DOI: 10.1109/SBAC-PAD.2017.10

F. Pisani, Jeferson Rech Brunetta, Vanderson Martins do Rosário, E. Borin

{"title":"Beyond the Fog: Bringing Cross-Platform Code Execution to Constrained IoT Devices","authors":"F. Pisani, Jeferson Rech Brunetta, Vanderson Martins do Rosário, E. Borin","doi":"10.1109/SBAC-PAD.2017.10","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2017.10","url":null,"abstract":"Considering the prediction that there will be over 50 billion devices connected to the Internet of Things (IoT) in the near future, the demand for efficient ways to process data streams generated by sensors grows ever larger, highlighting the necessity to re-evaluate current approaches, such as sending all data to the cloud for processing and analysis.In this paper, we explore one of the methods for improving this scenario: bringing the computation closer to data sources. By executing the code on the IoT devices themselves instead of on the network edge or the cloud, solutions can better meet the latency requirements of several applications, avoid problems with slow and intermittent network connections, prevent network congestion, and potentially save energy by reducing communication.To this end, we propose the LMC framework and compare it with Edgent, an open-source project that is under development by the Apache Incubator. By using a DragonBoard 410c to execute a simple filter, an outlier detector, and a program that calculates the FFT, we obtained results that indicate that LMC outperforms Edgent when dynamic translation is disabled for both of them and is more suitable for lightweight quick queries otherwise. More importantly, the LMC also enables us to perform cross-platform code execution on small, cheap devices that do not have enough resources to run Edgent, like the NodeMCU 1.0.","PeriodicalId":187204,"journal":{"name":"2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116869302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

GC-CR: A Decentralized Garbage Collector Component for Checkpointing in Clouds GC-CR:用于云中检查点的分散式垃圾收集器组件

2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) Pub Date : 2017-10-01 DOI: 10.1109/SBAC-PAD.2017.20

Thouraya Louati, Heithem Abbes, C. Cérin, M. Jemni

{"title":"GC-CR: A Decentralized Garbage Collector Component for Checkpointing in Clouds","authors":"Thouraya Louati, Heithem Abbes, C. Cérin, M. Jemni","doi":"10.1109/SBAC-PAD.2017.20","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2017.20","url":null,"abstract":"Infrastructure-as-a-Service container-based virtualization technology is gaining significant interest in industry as an alternative platform for running distributed applications. With increasing scale of Cloud Computing architectures, faults are becoming a frequent occurrence. Checkpoint-Restart is a key method to survive to failures in this context. However, there is a need to reduce the amount of checkpointing data as the Cloud is based on the pay-as-you-go model. This paper addresses the issue of garbage collection in LXCloud-CR and contributes with a novel decentralized garbage collection component GC-CR. LXCloud-CR, a decentralized Checkpoint-Restart model, is able to take snapshots of Linux Container instances and it uses replication to increase snapshots availability. LXCloud-CR contains a versioning scheme for each replica. The disadvantage refers to snapshots availability issues with versioning as the number of useless files grows. GC-CR is a decentralized garbage collector (checkpoint deletion) component that attempts to identify and eliminate old snapshots versions from the system in order to free storage space. Large scale experiments on the Grid5000 testbed demonstrate the benefits of our proposal. Obtained results validate our model and show significant reduction of storage space consumption","PeriodicalId":187204,"journal":{"name":"2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","volume":"152 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114378165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Scalability of CPU and GPU Solutions of the Prime Elliptic Curve Discrete Logarithm Problem 素数椭圆曲线离散对数问题的CPU和GPU解的可扩展性

2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) Pub Date : 2017-10-01 DOI: 10.1109/SBAC-PAD.2017.12

J. Panetta, P. S. Filho, Luiz A. F. Laranjeira, Carlos A. Teixeira

{"title":"Scalability of CPU and GPU Solutions of the Prime Elliptic Curve Discrete Logarithm Problem","authors":"J. Panetta, P. S. Filho, Luiz A. F. Laranjeira, Carlos A. Teixeira","doi":"10.1109/SBAC-PAD.2017.12","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2017.12","url":null,"abstract":"Elliptic curve asymmetric cryptography has achieved increased popularity due to its capability of providing comparable levels of security as other existing cryptographic systems while requiring less computational work. Pollard Rho and Parallel Collision Search, the fastest known sequential and parallel algorithms for breaking this cryptographic system, have been successfully applied over time to break ever-increasing bit-length system instances using implementations heavily optimized for the available hardware. This work presents portable, general implementations of a Parallel Collision Search based solution for prime elliptic curve asymmetric cryptographic systems that use publicly available big integer libraries and make no assumption on prime curve properties. It investigates which bit-length keys can be broken in reasonable time by a user that has access to a state of the art, public HPC equipment with CPUs and GPUs. The final implementation breaks a 79-bit system in about two hours using 80 GPUs and 94-bits system in about 15 hours using 256 GPUs. Extensive experimentation investigates scalability of CPU, GPU and CPU+GPU runs. The discussed results indicate that speed-up is not a good metric for parallel scalability. This paper proposes and evaluates a new metric that is better suited for this task.","PeriodicalId":187204,"journal":{"name":"2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124843895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Extending OmpSs for OpenCL Kernel Co-Execution in Heterogeneous Systems 异构系统中OpenCL内核协同执行的扩展

2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) Pub Date : 2017-10-01 DOI: 10.1109/SBAC-PAD.2017.8

Borja Pérez, Esteban Stafford, J. L. Bosque, R. Beivide, Sergi Mateo, Xavier Teruel, X. Martorell, E. Ayguadé

引用次数: 2

Exploiting Data Compression to Mitigate Aging in GPU Register Files 利用数据压缩缓解GPU寄存器文件老化

2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) Pub Date : 2017-10-01 DOI: 10.1109/SBAC-PAD.2017.15

F. Candel, A. Valero, S. Petit, D. S. Gracia, J. Sahuquillo

{"title":"Exploiting Data Compression to Mitigate Aging in GPU Register Files","authors":"F. Candel, A. Valero, S. Petit, D. S. Gracia, J. Sahuquillo","doi":"10.1109/SBAC-PAD.2017.15","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2017.15","url":null,"abstract":"Nowadays, GPUs sit at the forefront of highperformance computing thanks to their massive computational capabilities. Internally, thousands of functional units, architected to be fed by large register files, fuel such a performance.At nanometer technologies, the SRAM cells that implement register files suffer the Negative Bias Temperature Instability (NBTI) effect, which degrades the transistor threshold voltage Vth and, in turn, can make cells faulty unreliable when they hold the same logic value for long periods of time.Fortunately, the GPU single-thread multiple-data execution model writes data in recognizable patterns. This work proposes mechanisms to detect those patterns, and to compress and shuffle the data, so that compressed register file entries can be safely powered off, mitigating NBTI aging.Experimental results show that a conventional GPU register file experiences the worst case for NBTI, since maintains cells with a single logic value during the entire application execution (i.e., a 100% 0 and 1 duty cycle distributions). On average, the proposal reduces these distributions by 61% and 72%, respectively, which translates into Vth degradation savings by 57% and 64%, respectively.","PeriodicalId":187204,"journal":{"name":"2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130829937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Online Multimedia Similarity Search with Response Time-Aware Parallelism and Task Granularity Auto-Tuning 具有响应时间感知并行性和任务粒度自动调优的在线多媒体相似度搜索

2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) Pub Date : 2017-10-01 DOI: 10.1109/SBAC-PAD.2017.27

Guilherme Andrade, George Teodoro, R. Ferreira

引用次数: 2

Exploring Heterogeneous Mobile Architectures with a High-Level Programming Model 用高级编程模型探索异构移动架构

2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) Pub Date : 2017-10-01 DOI: 10.1109/SBAC-PAD.2017.11

W. D. C. Moreira, Guilherme Andrade, Pedro Caldeira, Renato Utsch Goncalves, R. Ferreira, L. Rocha, Renan de Carvalho Sousa, Millas Nasser Ramsses Avelar

引用次数: 4

Addressing Energy Challenges in Filter Caches 解决过滤器缓存中的能源挑战

2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) Pub Date : 2017-10-01 DOI: 10.1109/SBAC-PAD.2017.14

Ricardo Alves, Nikos Nikoleris, S. Kaxiras, D. Black-Schaffer

{"title":"Addressing Energy Challenges in Filter Caches","authors":"Ricardo Alves, Nikos Nikoleris, S. Kaxiras, D. Black-Schaffer","doi":"10.1109/SBAC-PAD.2017.14","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2017.14","url":null,"abstract":"Filter caches and way-predictors are common approaches to improve the efficiency and/or performance of first-level caches. Filter caches use a small L0 to provide more efficient and faster access to a small subset of the data, and work well for programs with high locality. Way-predictors improve efficiency by accessing only the way predicted, which alleviates the need to read all ways in parallel without increasing latency, but hurts performance due to mispredictions.In this work we examine how SRAM layout constraints (h-trees and data mapping inside the cache) affect way-predictors and filter caches. We show that accessing the smaller L0 array can be significantly more energy efficient than attempting to read fewer ways from a larger L1 cache; and that the main source of energy inefficiency in filter caches comes from L0 and L1 misses. We propose a filter cache optimization that shares the tag array between the L0 and the L1, which incurs the overhead of reading the larger tag array on every access, but in return allows us to directly access the correct L1 way on each L0 miss. This optimization does not add any extra latency and counter-intuitively, improves the filter caches overall energy efficiency beyond that of the way-predictor.By combining the low power benefits of a physically smaller L0 with the reduction in miss energy by reading L1 tags upfront in parallel with L0 data, we show that the optimized filter cache reduces the dynamic cache energy compared to a traditional filter cache by 26% while providing the same performance advantage. Compared to a way-predictor, the optimized cache improves performance by 6% and energy by 2%.","PeriodicalId":187204,"journal":{"name":"2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130163401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5