2015 International Symposium on Computer Architecture and High Performance Computing Workshop (SBAC-PADW)最新文献

Energy Consumption and Scalability Evaluation for Software Transactional Memory on a Real Computing Environment 真实计算环境下软件事务性内存的能耗与可扩展性评价

2015 International Symposium on Computer Architecture and High Performance Computing Workshop (SBAC-PADW) Pub Date : 2015-10-18 DOI: 10.1109/SBAC-PADW.2015.11

T. Rico, M. Pilla, A. R. D. Bois, R. M. Duarte

{"title":"Energy Consumption and Scalability Evaluation for Software Transactional Memory on a Real Computing Environment","authors":"T. Rico, M. Pilla, A. R. D. Bois, R. M. Duarte","doi":"10.1109/SBAC-PADW.2015.11","DOIUrl":"https://doi.org/10.1109/SBAC-PADW.2015.11","url":null,"abstract":"Transactional Memory is a concurrent programming abstraction that overcomes several of the limitations found in traditional synchronization mechanisms. As it is a more recent abstraction, little is known about energy consumption of Software Transactional Memories (STM). In this context, this work presents an analysis and characterization of energy consumption and performance of four Transactional Memory libraries: TL2, Tiny STM, Swiss TM, and Adapt STM, using the STAMP benchmarks. Although most works in the state-of-the-art chose to evaluate Transactional Memories through simulation, in this work the benchmarks are run in actual computers, avoiding the known issues with modeling power consumption in simulators. Our results show that Swiss TM is the most efficient library of the four in terms of energy consumption and performance for the default configurations, followed by Adapt STM, Tiny STM, and TL2, for most of the execution scenarios and 8 threads at most. STM's scalability is directly tied to the strategies for detection and resolution of conflicts. In this perspective, Adapt STM is the best STM for applications with short transactions, Swiss TM presents the best results for medium transactions, and long transactions with medium/high contention are best handled by TL2. On the other hand, Tiny STM shows the worst scalability for most scenarios, but with good results for applications with very small abort rates.","PeriodicalId":161685,"journal":{"name":"2015 International Symposium on Computer Architecture and High Performance Computing Workshop (SBAC-PADW)","volume":"689 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123826997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Characterizing Anomalies of a Multicore ARMv7 Cluster with Parallel N-Body Simulations 基于并行n体仿真的ARMv7多核集群异常特性研究

2015 International Symposium on Computer Architecture and High Performance Computing Workshop (SBAC-PADW) Pub Date : 2015-10-18 DOI: 10.1109/SBAC-PADW.2015.18

J. L. Bez, L. Schnorr, P. Navaux

{"title":"Characterizing Anomalies of a Multicore ARMv7 Cluster with Parallel N-Body Simulations","authors":"J. L. Bez, L. Schnorr, P. Navaux","doi":"10.1109/SBAC-PADW.2015.18","DOIUrl":"https://doi.org/10.1109/SBAC-PADW.2015.18","url":null,"abstract":"ARM processors are beginning to gain attention from the HPC community due to its performance and energy efficiency characteristics. When developing HPC applications for such test beds developers assume that the computation resources available are homogeneous. However, we observed some anomalies when executing a relatively simple HPC application (an NBody simulation). One of the cores in all available nodes presented some variabilities in the computation time. This unexpected behavior was not observed on the second core of each node. In this paper, we aim at characterizing such anomalies, seen in a multicore ARMv7 8-node cluster. We also attempted to isolate and remove all possible interferences that could be contributing to this unexpected behavior, including compilation directives, dynamic processor frequency scaling and communication. Results show that such anomaly might be correlated with the architecture of the dual-core chip. We also analyze the effects of different deployments of MPI process in the total execution time and correlate them to the application and the test bed.","PeriodicalId":161685,"journal":{"name":"2015 International Symposium on Computer Architecture and High Performance Computing Workshop (SBAC-PADW)","volume":"143 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131644069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

On the Evaluation of Contention-Aware List Schedulers on Multicore Cluster 多核集群上竞争感知列表调度器的评估

2015 International Symposium on Computer Architecture and High Performance Computing Workshop (SBAC-PADW) Pub Date : 2015-10-18 DOI: 10.1109/SBAC-PADW.2015.19

Juliana Zamith, Thiago Silva, Lúcia M. A. Drummond, Cristina Boeres, C. Bentes

引用次数: 1

RadFlow: An Interest-Centric Task Based Dataflow Runtime RadFlow:一个以兴趣为中心的基于任务的数据流运行时

2015 International Symposium on Computer Architecture and High Performance Computing Workshop (SBAC-PADW) Pub Date : 2015-10-18 DOI: 10.1109/SBAC-PADW.2015.26

D. Dutra, Heberte F. Moraes, C. Amorim

引用次数: 2

Intra-Clustering: Accelerating On-chip Communication for Data Parallel Architectures 集群内:加速数据并行架构的片上通信

2015 International Symposium on Computer Architecture and High Performance Computing Workshop (SBAC-PADW) Pub Date : 2015-10-18 DOI: 10.1109/SBAC-PADW.2015.15

Wen Yuan, R. Boyapati, Lei Wang, Hyunjun Jang, Yuho Jin, K. H. Yum, Eun Jung Kim

{"title":"Intra-Clustering: Accelerating On-chip Communication for Data Parallel Architectures","authors":"Wen Yuan, R. Boyapati, Lei Wang, Hyunjun Jang, Yuho Jin, K. H. Yum, Eun Jung Kim","doi":"10.1109/SBAC-PADW.2015.15","DOIUrl":"https://doi.org/10.1109/SBAC-PADW.2015.15","url":null,"abstract":"Modern computation workloads contain abundant Data Level Parallelism (DLP), which requires specialized data parallel architectures, such as Graphics Processing Units (GPUs). With parallel programming models, such as CUDA and OpenCL, GPUs are easily to be programmed for non-graphics applications, and therefore become a cost effective approach for data parallel architectures. The large quantity of available parallelism places a heavy stress on the memory system as the limited number of pins confines the number of memory controllers on the chip. This creates a potential bottleneck for performance scalability of the GPUs. To accelerate communication with the memory system, we propose the Intra-Clustering on-chip network for data parallel architectures, which is built upon a traditional two-dimensional electrical mesh network with memory controllers connected through a nanophotonic ring and compute cores grouped into different clusters. Our evaluations with CUDA benchmarks show that the Intra-Clustering architecture can improve communication delay by an average of 17% (up to 32%) and IPC by an average of 5% (up to 11.5%).","PeriodicalId":161685,"journal":{"name":"2015 International Symposium on Computer Architecture and High Performance Computing Workshop (SBAC-PADW)","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126868365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Graph Templates for Dataflow Programming 数据流编程的图形模板

2015 International Symposium on Computer Architecture and High Performance Computing Workshop (SBAC-PADW) Pub Date : 2015-10-18 DOI: 10.1109/SBAC-PADW.2015.20

A. Sena, Eduardo S. Vaz, F. França, L. A. J. Marzulo, Tiago A. O. Alves

引用次数: 2

Evaluating Overhead and Contention in Concurrent Accesses to a Graph 计算图并发访问的开销和争用

2015 International Symposium on Computer Architecture and High Performance Computing Workshop (SBAC-PADW) Pub Date : 2015-10-18 DOI: 10.1109/SBAC-PADW.2015.27

Israel da Silva Barbara, Nicolas O. de Araujo, A. R. D. Bois, G. H. Cavalheiro

引用次数: 0

A Parallel Implementation of Data Fusion Algorithm Using Gamma 一种基于Gamma的数据融合并行实现算法

2015 International Symposium on Computer Architecture and High Performance Computing Workshop (SBAC-PADW) Pub Date : 2015-10-18 DOI: 10.1109/SBAC-PADW.2015.25

Rui R. Mello Junior, Rubens H. P. de Almeida, F. França, G. Paillard

{"title":"A Parallel Implementation of Data Fusion Algorithm Using Gamma","authors":"Rui R. Mello Junior, Rubens H. P. de Almeida, F. França, G. Paillard","doi":"10.1109/SBAC-PADW.2015.25","DOIUrl":"https://doi.org/10.1109/SBAC-PADW.2015.25","url":null,"abstract":"In this paper we carried out designing and implementing of a target tracking data fusion algorithm based on a two stages graph solution using the computational model Gamma (General Abstract Model for Multiset mAnipulation). The proposed solution is the first parallel implementation of the method PPTS (Pairs of Plots in Two Stages). For this, we employed three Gamma implementations, where two of them exploited the resources of a parallel hardware environment, one using the MPI (Message Passing Interface) and the other one GPU (Graphics Processing Unit). Thus, the studied algorithm was evaluated from the parallelism exploited and finally was carried out a performance analysis of this algorithm in the three Gamma implementations used. The aim of this study is to provide an implementation on a real problem using for this the paradigm Gamma, which contributes to the implementations of the Gamma computational model, since it enables the performance analysis of these implementations and provides some suggestions for possible improvements. In addition, this work contributes to the PPTS method since it provides the parallelization of the first stage.","PeriodicalId":161685,"journal":{"name":"2015 International Symposium on Computer Architecture and High Performance Computing Workshop (SBAC-PADW)","volume":"116 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127428777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Exploiting Parallelism in Linear Algebra Kernels through Dataflow Execution 通过数据流执行开发线性代数核的并行性

2015 International Symposium on Computer Architecture and High Performance Computing Workshop (SBAC-PADW) Pub Date : 2015-10-18 DOI: 10.1109/SBAC-PADW.2015.21

Brunno F. Goldstein, F. França, L. A. J. Marzulo, Tiago A. O. Alves

{"title":"Exploiting Parallelism in Linear Algebra Kernels through Dataflow Execution","authors":"Brunno F. Goldstein, F. França, L. A. J. Marzulo, Tiago A. O. Alves","doi":"10.1109/SBAC-PADW.2015.21","DOIUrl":"https://doi.org/10.1109/SBAC-PADW.2015.21","url":null,"abstract":"Linear Algebra Kernels have an important role in many petroleum reservoir simulators, extensively used by the industry. The growth in problem size, specially in pre-salt exploration, has caused an increase in execution time of those kernels, thus requiring parallel programming to improve performance and make the simulation viable. On the other hand, exploiting parallelism in systems with an ever increasing number of cores may be an arduous task, as the programmer has to manage threads and care about synchronization issues. Current work on parallel programming models show that Dataflow Execution exploits parallelism in a natural way, allowing the programmer to focus solely on describing dependencies between portions of code. This work consists in implementing parallel Linear Algebra Kernels using the Dataflow model. The Trebuchet Dataflow Virtual Machine and the Sucuri Dataflow Library were used to evaluate the solutions with real inputs from reservoir simulators. Results have been compared with OpenMP and Intel Math Kernel Library and show that coarser-grained tasks are needed to hide the overheads of dataflow runtime environments. Therefore, level 2 and 3 linear algebra operations, such as Vector-Matrix and Matrix-Matrix products, presented the most promising results.","PeriodicalId":161685,"journal":{"name":"2015 International Symposium on Computer Architecture and High Performance Computing Workshop (SBAC-PADW)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126177563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Impact of Version Management on Transactional Memories' Performance 版本管理对事务性内存性能的影响

2015 International Symposium on Computer Architecture and High Performance Computing Workshop (SBAC-PADW) Pub Date : 2015-10-18 DOI: 10.1109/SBAC-PADW.2015.14

Felipe L. Teixeira, M. Pilla, A. R. D. Bois, D. Mossé

引用次数: 2