2017 International Conference on High Performance Computing & Simulation (HPCS)最新文献_第5页

On Determining Multiple Optimal Parenthesizations for Matrix Chain Products and Scheduling the Corresponding Task Graphs 矩阵链积多个最优括号的确定及相应任务图的调度

2017 International Conference on High Performance Computing & Simulation (HPCS) Pub Date : 2017-07-01 DOI: 10.1109/HPCS.2017.61

Khaoula Bezzina, Bchira Ben Mabrouk, Z. Mahjoub

{"title":"On Determining Multiple Optimal Parenthesizations for Matrix Chain Products and Scheduling the Corresponding Task Graphs","authors":"Khaoula Bezzina, Bchira Ben Mabrouk, Z. Mahjoub","doi":"10.1109/HPCS.2017.61","DOIUrl":"https://doi.org/10.1109/HPCS.2017.61","url":null,"abstract":"We are interested in an easy combinatorial optimization problem having several applications in the real world, namely the matrix chain product problem that may be solved by a well known dynamic programming algorithm (DPA). Our contribution is two-fold. It first consists in the design of an approach based on the DPA for the determination of multiple optimal solutions i.e. optimal parenthesizations (OPs) which may be represented by binary in-trees (BITs). Since our aim is to efficiently parallelize the computation of the resulting product matrix, we define for this purpose a particular inter-OPs comparative criterion i.e. the cost of a critical path in each BIT. Afterwards, we design different schedulings for the corresponding BIT task graphs (TGs). The procedure begins by the construction of the earliest and latest level partitions (LPs) of the TG. Then, after choosing three granularity sizes, i.e coarse, medium and mixed coarse-medium grains, and given an arbitrary number of processors, the proposed schedulings follow a level-per-level scanning of LPs of the TG. In addition to the design of efficient schedulings, we also determine the minimum number of processors to schedule the TG in minimal time. Our contribution is validated by an experimental study achieved on a series of input data (chain lengths, matrix sizes and number of processors) permitting to establish fine comparisons between the determined OPs and the designed schedulings, thus illustrating the practical interest of the study.","PeriodicalId":115758,"journal":{"name":"2017 International Conference on High Performance Computing & Simulation (HPCS)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121650337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

A First Investigation on the Dynamics of Two Delayed Neurons through Fuzzy Transform Approximation 用模糊变换逼近法研究两个延迟神经元的动力学

2017 International Conference on High Performance Computing & Simulation (HPCS) Pub Date : 2017-07-01 DOI: 10.1109/HPCS.2017.74

S. Tomasiello

引用次数: 3

Performance Evaluation of a Parallel Dynamic Programming Algorithm for Solving the 1D Array Partitioning Problem 求解一维阵列划分问题的并行动态规划算法的性能评价

2017 International Conference on High Performance Computing & Simulation (HPCS) Pub Date : 2017-07-01 DOI: 10.1109/HPCS.2017.59

H. Salhi, Bchira Ben Mabrouk, Z. Mahjoub

{"title":"Performance Evaluation of a Parallel Dynamic Programming Algorithm for Solving the 1D Array Partitioning Problem","authors":"H. Salhi, Bchira Ben Mabrouk, Z. Mahjoub","doi":"10.1109/HPCS.2017.59","DOIUrl":"https://doi.org/10.1109/HPCS.2017.59","url":null,"abstract":"We address the 1D array partitioning problem (1D- APP), an easy combinatorial optimization problem, for which an exact dynamic programming algorithm (DPA) is known in the literature. The DPA is structured in a perfect three DO-loop nest (3DLN) with affine loop bounds. Due to its cubic complexity which may be too time consuming for large size real world problems, we propose a parallelization approach (PA). The latter starts by a dependence analysis within the nest (presented in a previous work) permitting to derive several versions of the original DPA then keep the (theoretically) best one. Considering this latter, a 3DLN, our contribution detailed here first consists in choosing two task segmentations corresponding to two grain sizes i.e. fine (resp. medium) grain where a grain corresponds to the body of the third (resp. second) loop of the 3DLN. Afterwards, we construct particular level decompositions (LDs) of the corresponding layered task graphs and design, when an arbitrary number of processors is available, several schedulings (4 in the fine grain case and 2 in the medium grain case) based on scanning the levels of the LDs with and without inter-level overlapping. For each case the makespans of the schedulings are explicitly determined and analysed. Our theoretical contribution is validated through a series of simulations achieved on different input data and for different numbers of available processors. This permits to establish a fine comparison between the different scheduling thus showing their respective efficiencies.","PeriodicalId":115758,"journal":{"name":"2017 International Conference on High Performance Computing & Simulation (HPCS)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126876778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CUDA Based Parallel Implementations of Space-Saving on a GPU 基于CUDA的GPU空间节省并行实现

2017 International Conference on High Performance Computing & Simulation (HPCS) Pub Date : 2017-07-01 DOI: 10.1109/HPCS.2017.108

M. Cafaro, I. Epicoco, G. Aloisio, Marco Pulimeno

引用次数: 13

Extending OmpSs to Support Data Analytics Workload 扩展omps以支持数据分析工作负载

2017 International Conference on High Performance Computing & Simulation (HPCS) Pub Date : 2017-07-01 DOI: 10.1109/HPCS.2017.136

Marcos Maroñas

引用次数: 0

When is the Right Time to Start the Fault Tolerance Protection? 什么时候启动容错保护是合适的?

2017 International Conference on High Performance Computing & Simulation (HPCS) Pub Date : 2017-07-01 DOI: 10.1109/HPCS.2017.70

Jorge Villamayor, Dolores Rexachs, E. Luque

{"title":"When is the Right Time to Start the Fault Tolerance Protection?","authors":"Jorge Villamayor, Dolores Rexachs, E. Luque","doi":"10.1109/HPCS.2017.70","DOIUrl":"https://doi.org/10.1109/HPCS.2017.70","url":null,"abstract":"In High Performance Computing, Fault Tolerance (FT) becomes a primary concern due to the constant growing and continuous aging of hardware components, which rise failures probability. Failures produce performance degradation to the environment and affect significantly users expected execution time. Rollback-Recovery protocols represent a fundamental component to protect and restore users parallel application execution, although this protection comes with an overhead. This paper proposes a First Protection Point model, which determines the starting point to introduce FT protection gaining benefits in terms of total execution time including failures. A characterization of Rollback-Recovery protocols applied on parallel applications is performed, to obtain key factors for the model design. This model can help users determine which checkpoints can be removed from the application execution when they are used for FT protection purposes, reducing the overhead and at the same time keeping high availability. An analytic model evaluation is developed to show the inflexion point where FT protection starts to provide benefits for users. Finally, three experimental environments are setup, using two private clusters and a public cluster configured in a well-known cloud Amazon EC2. A coordinated checkpoint facility is applied on NAS benchmark applications such as: CG, BT and LU to evaluate the proposed model, obtaining overhead impact reduction for provided Fault Tolerance.","PeriodicalId":115758,"journal":{"name":"2017 International Conference on High Performance Computing & Simulation (HPCS)","volume":"112 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133675071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Using the Application Signature to Detect Inefficiencies Generated by Mapping Policies in Parallel Applications 使用应用签名检测并行应用中策略映射产生的低效率

2017 International Conference on High Performance Computing & Simulation (HPCS) Pub Date : 2017-07-01 DOI: 10.1109/HPCS.2017.85

C. Rangel, Alvaro Wong, Dolores Rexachs, E. Luque

{"title":"Using the Application Signature to Detect Inefficiencies Generated by Mapping Policies in Parallel Applications","authors":"C. Rangel, Alvaro Wong, Dolores Rexachs, E. Luque","doi":"10.1109/HPCS.2017.85","DOIUrl":"https://doi.org/10.1109/HPCS.2017.85","url":null,"abstract":"The execution of HPC applications in multicore environments can occasionally use the resources in an inefficient way. There are idle times during the application execution that can be caused by synchronization or message passing collisions. We define this idle time as an application inefficiency and may be caused by the message passing collisions at different types of interconnections in the compute nodes. We propose a methodology to characterize the application's execution in order to analyze and detect these inefficiencies in a bounded time as well as to locate on which parallel segments of the application code (phases) these inefficiencies are generated. The parallel segments of code (phases) represent the most relevant application behavior and are obtained by the application's characterization using the PAS2P tool. The tool allows us to predict the execution time by the generation of the application signature, which is composed of phases. Taking advantage of the prediction quality and the time to obtain the prediction of application performance, we propose modeling the factors that potentially influence the application's execution time, especially characterizing the behavior during the execution time of these phases. We performed experimental validation using signatures of NAS Parallel benchmarks in order to detect and model the inefficiencies in the application phases.","PeriodicalId":115758,"journal":{"name":"2017 International Conference on High Performance Computing & Simulation (HPCS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130427474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

A Deployment System for Highly Heterogeneous and Dynamic Environments 面向高度异构和动态环境的部署系统

2017 International Conference on High Performance Computing & Simulation (HPCS) Pub Date : 2017-07-01 DOI: 10.1109/HPCS.2017.98

Leila Abidi, C. Cérin, W. Saad

引用次数: 1

Modeling a Photonic Network for Exascale Computing 百亿亿次计算的光子网络建模

2017 International Conference on High Performance Computing & Simulation (HPCS) Pub Date : 2017-07-01 DOI: 10.1109/HPCS.2017.82

José Duro, S. Petit, J. Sahuquillo, M. E. Gómez

{"title":"Modeling a Photonic Network for Exascale Computing","authors":"José Duro, S. Petit, J. Sahuquillo, M. E. Gómez","doi":"10.1109/HPCS.2017.82","DOIUrl":"https://doi.org/10.1109/HPCS.2017.82","url":null,"abstract":"Photonics technology has become a promising and viable alternative for both on-chip and off-chip computer networks of future Exascale systems. Nevertheless, this technology is not mature enough yet in this context, so research efforts focusing on photonic networks are still required to achieve realistic suitable network implementations. In this context, system-level photonic network simulators can help to guide designers to assess the multiple design choices. Most current research is done on electrical network simulators, whose components work widely different from photonics components. Moreover, photonics technology adds new components that are not present in electrical networks. This paper discusses how a photonics simulation tool can be built by extending an electrical simulation framework. We summarize and compare the working behavior of both technologies -electrical and photonics, and discuss the rationale behind the proposed extensions. Among others, the devised extensions model optical routers, wavelength-division multiplexing, circuit switching, and specific routing algorithms. This work is aimed to provide support to investigate off- chip optical networks in the context of the European Exascale System Interconnect and Storage project (ExaNeSt) project. The experiments presented in this paper study multiple realistic photonic networks configurations and have been performed with excerpts of real traces. Experimental results show that, compared to electrical networks, optical networks can reduce the execution time of the workload by several orders of magnitude. Our study reveals that future optical technologies presenting a 3.2 Tbps aggregate link bandwidth will not provide additional performance benefits over state-of-the-art 1.6 Tbps optical links across the studied workloads, but 1.6 Tbps network links are enough to achieve the highest optical performance on computer networks. Regarding the link configuration, the bandwidth per optical channel is the parameter with highest impact on the network delay and so on the execution time, while for a given optical bandwidth per channel the better strategy is to reduce the phit size.","PeriodicalId":115758,"journal":{"name":"2017 International Conference on High Performance Computing & Simulation (HPCS)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116600601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

TurBase: A Software Platform for Research in Experimental and Numerical Fluid Dynamics TurBase:一个实验和数值流体动力学研究的软件平台

2017 International Conference on High Performance Computing & Simulation (HPCS) Pub Date : 2017-07-01 DOI: 10.1109/HPCS.2017.18

R. Benzi, Luca Biferale, F. Bonaccorso, H. Clercx, Alessandro Corbetta, W. Mobius, F. Toschi, F. Salvadore, C. Cacciari, G. Erbacci

引用次数: 6