2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)最新文献_第5页

Bio-Inspired Call-Stack Reconstruction for Performance Analysis 基于仿生的性能分析调用栈重建

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP) Pub Date : 2016-04-04 DOI: 10.1109/PDP.2016.62

Harald Servat, Germán Llort, Juan Gonzalez, Judit Giménez, Jesús Labarta

{"title":"Bio-Inspired Call-Stack Reconstruction for Performance Analysis","authors":"Harald Servat, Germán Llort, Juan Gonzalez, Judit Giménez, Jesús Labarta","doi":"10.1109/PDP.2016.62","DOIUrl":"https://doi.org/10.1109/PDP.2016.62","url":null,"abstract":"The correlation of performance bottlenecks and their associated source code has become a cornerstone of performance analysis. It allows understanding why the efficiency of an application falls behind the computer's peak performance and enabling optimizations on the code ultimately. To this end, performance analysis tools collect the processor call-stack and then combine this information with measurements to allow the analyst comprehend the application behavior. Some tools modify the call-stack during run-time to diminish the collection expense but at the cost of resulting in non-portable solutions. In this paper, we present a novel portable approach to associate performance issues with their source code counterpart. To address it, we capture a reduced segment of the call-stack (up to three levels) and then process the segments using an algorithm inspired by multi-sequence alignment techniques. The results of our approach are easily mapped to detailed performance views, enabling the analyst to unveil the application behavior and its corresponding region of code. To demonstrate the usefulness of our approach, we have applied the algorithm to several first-time seen in-production applications to describe them finely, and optimize them by using tiny modifications based on the analyses.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133462515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Evaluation of Splitting-Up Conjugate Gradient Method on GPUs gpu上拆分共轭梯度法的评价

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP) Pub Date : 2016-04-04 DOI: 10.1109/PDP.2016.9

A. Wakatani

引用次数: 0

Black-Box Optimization of Hadoop Parameters Using Derivative-Free Optimization 使用无导数优化的Hadoop参数黑盒优化

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP) Pub Date : 2016-04-04 DOI: 10.1109/PDP.2016.35

Diego Desani, V. Gil-Costa, C. Marcondes, H. Senger

{"title":"Black-Box Optimization of Hadoop Parameters Using Derivative-Free Optimization","authors":"Diego Desani, V. Gil-Costa, C. Marcondes, H. Senger","doi":"10.1109/PDP.2016.35","DOIUrl":"https://doi.org/10.1109/PDP.2016.35","url":null,"abstract":"Since its inception in 2004, MapReduce has revealed as a paramount platform and disruptive technology for the execution of high performance applications that process very large volumes of data. Hadoop is one of the most popular and widely adopted open source MapReduce implementation. Companies that execute large applications over hundreds or thousands of machines every day spend large efforts in performance tuning and optimization to reduce infrastructure costs. However, the framework has around 190 parameters which can be adjusted in a large number of different configurations that can significantly impact the performance of applications. The task of optimizing Hadoop parameters requires deep knowledge about a myriad platform details. In this paper, we propose and evaluate the use of derivative-free (DFO) methods for the automatic setup of Hadoop parameters to optimize the performance of applications. DFO methods provide a simple and efficient manner for automatic optimization of Hadoop MapReduce programs. Parameter changes are deployed through DevOps tools which are used to efficiently reconfigure the cluster according to DFO decisions. In the best scenario in our experiments, the automatic optimization leads to a reduction of 71% in the execution time over the default setup of parameters (i.e., an acceleration of 3.5 times) on a cluster of 28 nodes with very low overhead for production environments. Such results show that DFO methods and automatic optimization provide a promising tool for optimizing performance and reduction of costs for Hadoop applications which do not present dramatic variation in their behavior in daily production environments.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"338 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115671798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Parallel Improved Schnorr-Euchner Enumeration SE++ for the CVP and SVP CVP和SVP的并行改进Schnorr-Euchner枚举

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP) Pub Date : 2016-04-04 DOI: 10.1109/PDP.2016.95

Fábio Correia, Artur Mariano, A. Proença, C. Bischof, E. Agrell

引用次数: 10

Row Key Designs of NoSQL Database Tables and Their Impact on Write Performance NoSQL数据库表的行键设计及其对写性能的影响

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP) Pub Date : 2016-04-04 DOI: 10.1109/PDP.2016.84

Eftim Zdravevski, Petre Lameski, A. Kulakov

引用次数: 8

Exploring Cache Size and Core Count Tradeoffs in Systems with Reduced Memory Access Latency 在减少内存访问延迟的系统中探索缓存大小和核心计数的权衡

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP) Pub Date : 2016-04-04 DOI: 10.1109/PDP.2016.55

P. C. Santos, M. Alves, M. Diener, L. Carro, P. Navaux

{"title":"Exploring Cache Size and Core Count Tradeoffs in Systems with Reduced Memory Access Latency","authors":"P. C. Santos, M. Alves, M. Diener, L. Carro, P. Navaux","doi":"10.1109/PDP.2016.55","DOIUrl":"https://doi.org/10.1109/PDP.2016.55","url":null,"abstract":"One of the main challenges for computer architects is how to hide the high average memory access latency from the processor. In this context, Hybrid Memory Cubes (HMCs) can provide substantial energy and bandwidth improvements compared to traditional memory organizations. However, it is not clear how this reduced average memory access latency will impact the LLC. For applications with high cache miss ratios, the latency to search for the data inside the cache memory will impact negatively on the performance. The importance of this overhead depends on the memory access latency. In this paper, we present an evaluation of the L3 cache importance on a high performance processor using HMC also exploring chip area tradeoffs between the cache size and number of processor cores. We show that the high bandwidth provided by HMC memories can eliminate the need for L3 caches, removing hardware and making room for more processing power. Our evaluations show that performance increased 37% and the EDP improved 12% while maintaining the same original chip area in a wide range of parallel applications, when compared to DDR3 memories.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134560627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Avionics Applications on a Time-Predictable Chip-Multiprocessor 时间可预测芯片多处理器的航空电子应用

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP) Pub Date : 2016-04-04 DOI: 10.1109/PDP.2016.36

André Rocha, Cláudio Silva, R. B. Sorensen, J. Sparsø, Martin Schoeberl

引用次数: 6

MWPF: A Deadlock Avoidance Fully Adaptive Routing Algorithm in Networks-on-Chip 片上网络中避免死锁的全自适应路由算法

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP) Pub Date : 2016-04-04 DOI: 10.1109/PDP.2016.69

Kamran Nasiri, H. Zarandi

引用次数: 1

VerCors: A Layered Approach to Practical Verification of Concurrent Software VerCors:并行软件实际验证的分层方法

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP) Pub Date : 2016-04-04 DOI: 10.1109/PDP.2016.107

A. Amighi, S. Blom, M. Huisman

{"title":"VerCors: A Layered Approach to Practical Verification of Concurrent Software","authors":"A. Amighi, S. Blom, M. Huisman","doi":"10.1109/PDP.2016.107","DOIUrl":"https://doi.org/10.1109/PDP.2016.107","url":null,"abstract":"This paper discusses how several concurrent program verification techniques can be combined in a layered approach, where each layer is especially suited to verify one aspect of concurrent programs, thus making verification of concurrent programs practical. At the bottom layer, we use a combination of implicit dynamic frames and CSL-style resource invariants, to reason about data race freedom of programs. We illustrate this on the verification of a lock-free queue implementation. On top of this, layer 2 enables reasoning about resource invariants that express a relationship between thread-local and shared variables. This is illustrated by the verification of a reentrant lock implementation, where thread-locality is used to specify for a thread which locks it holds, while there is a global notion of ownership, expressing for a lock by which thread it is held. Finally, the top layer adds a notion of histories to reason about functional properties. We illustrate how this is used to prove that the lock-free queue preserves the order of elements, without having to reverify the aspects related to data race freedom.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"600 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123169304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

A General Purpose Branch and Bound Parallel Algorithm 一种通用分支定界并行算法

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP) Pub Date : 2016-04-04 DOI: 10.1109/PDP.2016.33

A. Dimopoulos, C. Pavlatos, G. Papakonstantinou

引用次数: 0