2017 International Conference on High Performance Computing & Simulation (HPCS)最新文献_第8页

Fine-Grained Parallel Solution for Solving Sparse Triangular Systems on Multicore Platform Using OpenMP Interface 基于OpenMP接口的多核平台稀疏三角形系统的细粒度并行求解

2017 International Conference on High Performance Computing & Simulation (HPCS) Pub Date : 2017-07-01 DOI: 10.1109/HPCS.2017.102

Sirine Marrakchi, M. Jemni

引用次数: 7

iAgile: Mission Critical Military Software Development 敏捷:关键任务军事软件开发

2017 International Conference on High Performance Computing & Simulation (HPCS) Pub Date : 2017-07-01 DOI: 10.1109/HPCS.2017.87

L. Benedicenti, A. Messina, A. Sillitti

{"title":"iAgile: Mission Critical Military Software Development","authors":"L. Benedicenti, A. Messina, A. Sillitti","doi":"10.1109/HPCS.2017.87","DOIUrl":"https://doi.org/10.1109/HPCS.2017.87","url":null,"abstract":"This paper reports the experience of applying agile methods in the defense sector, characterized mostly by embedded and mission critical software. We describe the experience of creating a Command and Control system for the 4th Logistic Department of the Italian Army's General Staff. The project was approved by the Army as a pilot to determine whether it could be possible to reduce development costs and at the same time produce a product better responsive to the changing conditions in the theatre of operations, where often the confrontation has become asymmetric and requires reaction times much faster than the conventional approach. After 13 five-week long sprints, we were able to deliver a complete product that met all user requirements and satisfied regulatory Army requirements. Achieving this result required a concerted effort to change the development culture, but even when counting this effort as part of the development costs, the total development costs were lower than the costs of using the traditional development method. This paper summarizes the experience trying, whenever possible, to quantify the results, and to support the observed positive results with appropriate data.","PeriodicalId":115758,"journal":{"name":"2017 International Conference on High Performance Computing & Simulation (HPCS)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114415304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Memory Aware Poisson Solver for Peta-Scale Simulations with one FFT Diagonalizable Direction 具有一个FFT可对角化方向的peta级模拟的内存感知泊松求解器

2017 International Conference on High Performance Computing & Simulation (HPCS) Pub Date : 2017-07-01 DOI: 10.1109/HPCS.2017.26

G. Oyarzun, R. Borrell, F. Trias, A. Oliva

{"title":"Memory Aware Poisson Solver for Peta-Scale Simulations with one FFT Diagonalizable Direction","authors":"G. Oyarzun, R. Borrell, F. Trias, A. Oliva","doi":"10.1109/HPCS.2017.26","DOIUrl":"https://doi.org/10.1109/HPCS.2017.26","url":null,"abstract":"Problems with some sort of divergence constraint are found in many disciplines: computational fluid dynamics, linear elasticity and electrostatics are examples thereof. Such a constraint leads to a Poisson equation which usually is one of the most computationally intensive parts of scientific simulation codes. In this work, we present a memory aware Poisson solver for problems with one Fourier diagonalizable direction. This diagonalization decomposes the original 3D system into a set of independent 2D subsystems. The proposed algorithm focuses on optimizing the memory allocations and transactions by taking into account redundancies on such 2D subsystems. Moreover, we also take advantage of the uniformity of the solver through the periodic direction for its vectorization. Additionally, our novel approach automatically optimizes the choice of the preconditioner used for the solution of each frequency subsystem and dynamically balances its parallel distribution. Altogether constitutes a highly efficient and robust HPC Poisson solver that has been successfully attested up to 16384 CPU-cores.","PeriodicalId":115758,"journal":{"name":"2017 International Conference on High Performance Computing & Simulation (HPCS)","volume":"468 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116603290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An Automation Framework for Benchmarking and Optimizing Performance of Remote Desktops in the Cloud 云中远程桌面性能基准测试和优化的自动化框架

2017 International Conference on High Performance Computing & Simulation (HPCS) Pub Date : 2017-07-01 DOI: 10.1109/HPCS.2017.113

A. Pandey, Lan Vu, Vivek Puthiyaveettil, Hari Sivaraman, Uday Kurkure, Aravind Bappanadu

引用次数: 7

An Efficient Transaction-Based GPU Implementation of Minimum Spanning Forest Algorithm 一种高效的基于事务的最小生成森林算法GPU实现

2017 International Conference on High Performance Computing & Simulation (HPCS) Pub Date : 2017-07-01 DOI: 10.1109/HPCS.2017.100

Shayan Manoochehri, B. Goodarzi, D. Goswami

{"title":"An Efficient Transaction-Based GPU Implementation of Minimum Spanning Forest Algorithm","authors":"Shayan Manoochehri, B. Goodarzi, D. Goswami","doi":"10.1109/HPCS.2017.100","DOIUrl":"https://doi.org/10.1109/HPCS.2017.100","url":null,"abstract":"General Purpose GPUs (GPGPUs) are ideal platforms for parallel execution of applications with regular shared memory access patterns. However, majority of real world multithreaded applications require access to shared memory with irregular patterns. The Minimum Spanning Forest (MSF) calculation arises in many real world applications. The Boruvka's algorithm for calculating MSF has the most expressed parallelism; however, it is a challenging irregular algorithm to implement on GPUs. In this paper we show that a transaction- based design and implementation of the Boruvka's algorithm on GPU can handle some of the challenges arising due to irregularity. First, we identify the hotspots of the algorithm that are the main bottlenecks: edge discovery and merge. The edge discovery phase is implemented using lock-free synchronizations after extracting certain algebraic properties (e.g. monotonicity) of the computation. The merge phase, however, lacks such algebraic properties and hence we utilize a Software Transactional Memory (STM) based synchronization method. STM offers ease of use by guaranteeing deadlock/livelock-free behavior as opposed to blocking lock-based synchronization. It also increases programmability by providing high level abstractions for synchronization which facilitate a natural transition from algorithm design to implementation. In addition, we employ several optimization techniques in different phases of the algorithm to achieve load balance and enhanced GPU resource utilization. Experimental results show that our GPU-based implementation outperforms both the fastest sequential implementation and the existing STM-based implementation on multicore CPUs when tested on large-scale graphs with diverse densities.","PeriodicalId":115758,"journal":{"name":"2017 International Conference on High Performance Computing & Simulation (HPCS)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128342912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Efficient Data-Driven Task Allocation for Future Many-Cluster On-chip Systems 未来多集群片上系统的高效数据驱动任务分配

2017 International Conference on High Performance Computing & Simulation (HPCS) Pub Date : 2017-07-01 DOI: 10.1109/HPCS.2017.81

A. Scionti, Somnath Mazumdar, A. Portero

{"title":"Efficient Data-Driven Task Allocation for Future Many-Cluster On-chip Systems","authors":"A. Scionti, Somnath Mazumdar, A. Portero","doi":"10.1109/HPCS.2017.81","DOIUrl":"https://doi.org/10.1109/HPCS.2017.81","url":null,"abstract":"Continuous demand for higher performance is adding more pressure on hardware designers to provide faster machines with low energy consumption. Recent technological advancements allow placing a group of silicon dies on top of a conventional interposer (silicon layer), which provides space to integrate logic and interconnection resources to manage active processing cores. However, such large resource availability requires an adequate Program eXecution Model (PXM) as well as an efficient mechanism to allocate resources in the system. From this perspective, fine-grain data-driven PXMs represent an attractive solution to reduce the cost of synchronising concurrent activities. The contribution of this work is twofold. First, a hardware architecture called TALHES - a Task ALlocator for HEterogeneous System is proposed to support scheduling of multi-threaded applications (adhering to an explicit data-driven PXM). TALHES introduces a Network-on-Chip (NoC) extension: i) while on-chip 2D-mesh NoCs are used to support locality of computations in the execution of a single task; ii) a global task scheduler integrated into the silicon interposer orchestrates application tasks among different clusters of cores (eventually with different computing capabilities). The second contribution of the paper is a simulation framework that is tailored to support the analysis of such fine-grain data-driven applications. In this work, Linux Containers are used to abstract and efficiently simulate clusters of cores (i.e., a single die), as well as the behaviour of the global scheduling unit.","PeriodicalId":115758,"journal":{"name":"2017 International Conference on High Performance Computing & Simulation (HPCS)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121053356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Parallel Adaptively Restrained Molecular Dynamics 平行自适应约束分子动力学

2017 International Conference on High Performance Computing & Simulation (HPCS) Pub Date : 2017-07-01 DOI: 10.1109/HPCS.2017.55

Krishnavir Singh, D. F. Marin, S. Redon

引用次数: 4

Reducing the Memory Footprint of an Eikonal Solver 减少Eikonal求解器的内存占用

2017 International Conference on High Performance Computing & Simulation (HPCS) Pub Date : 2017-07-01 DOI: 10.1109/HPCS.2017.57

Daniel Ganellari, G. Haase

{"title":"Reducing the Memory Footprint of an Eikonal Solver","authors":"Daniel Ganellari, G. Haase","doi":"10.1109/HPCS.2017.57","DOIUrl":"https://doi.org/10.1109/HPCS.2017.57","url":null,"abstract":"The numerical solution of the Eikonal equation follows the fast iterative method with its application for tetrahe-dral meshes. Therein the main operations in each discretization element τ contain various inner products in the M-metric as ($e^{rarr}$k,s,$e^{rarr}$s,ℓMτ $e^{rarr}$Tk,s · Mτ · $e^{rarr}$s,ℓ with $e^{rarr}$s,ℓ as connecting edge between vertices s and ℓ in element τ. Instead of passing all coordinates of the tetrahedron together with the 6 entries of Mτ we precompute these inner products and use only them in the wave front computation. This first change requires less memory transfers for each tetrahedron. The second change is caused by the fact that ($e^{rarr}$k,s,$e^{rarr}$s, ℓMτ (k ≠ℓ) represents an angle of a surface triangle whereas $e^{rarr}$k,s,$e^{rarr}$k,smτ represents the length of an edge in the M- metric. Basic geometry as well as vector arithmetics yield to the conclusion that the angle information can be expressed by the combination of three edge lengths. Therefore we only have to precompute the 6 edge lengths of a tetrahedron and compute the remaining 12 angle data on-the-fly which reduces the memory footprint per tetrahedron to 6 numbers. The efficient implementation of the two changes requires a local Gray-code numbering of edges in the tetrahedron and a bunch of bit shifts to assign the appropriate data. First numerical experiments on CPUs show that the reduced memory footprint approach is faster than the original implementation. Detailed investigations as well as a CUDA implementation are ongoing work.","PeriodicalId":115758,"journal":{"name":"2017 International Conference on High Performance Computing & Simulation (HPCS)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126223545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Limitations of Energy Expenditure Calculation Based on a Mobile Phone Accelerometer 基于手机加速度计的能量消耗计算的局限性

2017 International Conference on High Performance Computing & Simulation (HPCS) Pub Date : 2017-07-01 DOI: 10.1109/HPCS.2017.29

I. Pires, Virginie Felizardo, Nuno Pombo, N. Garcia

引用次数: 19

Design and Implementation of SDN-enhanced MPI Broadcast Targeting a Fat-Tree Interconnect 针对胖树互连的sdn增强型MPI广播的设计与实现

2017 International Conference on High Performance Computing & Simulation (HPCS) Pub Date : 2017-07-01 DOI: 10.1109/HPCS.2017.46

H. Morimoto, Khureltulga Dashdavaa, Keichi Takahashi, Y. Kido, S. Date, S. Shimojo

{"title":"Design and Implementation of SDN-enhanced MPI Broadcast Targeting a Fat-Tree Interconnect","authors":"H. Morimoto, Khureltulga Dashdavaa, Keichi Takahashi, Y. Kido, S. Date, S. Shimojo","doi":"10.1109/HPCS.2017.46","DOIUrl":"https://doi.org/10.1109/HPCS.2017.46","url":null,"abstract":"To meet the rising demands on high-performance computing, the number of computing nodes composing a high- performance computing system has been continuously growing. Simultaneously, the complexity of networks linking such computing nodes, or the interconnect, has also been increasing. Taking the scale-out of computing nodes in future high-performance computing systems into consideration, it is unrealistic to build more nodes with the strategy of building a network capacity sufficient enough to accommodate maximum traffic. We have worked on SDN-enhanced MPI based on the challenging idea that network traffic should be controlled based on the time-variant requirements of applications running on the high-performance computing systems. In particular, this paper aims to accelerate MPI_Bcast execution through the use of Software Defined Net-working (SDN), targeting a high-performance computing system with a Fat-tree interconnect. The MPI_Bcast proposed in this paper has the functionality of making a delivery tree of data based on traffic information obtained from SDN switches that compose the deployed interconnect. Our evaluation observed our proposed MPI_Bcast was executed up to 8.6 times faster than our previous MPI_Bcast implementation when a 700 Mbps pseudo traffic was flowed on the Fat-tree interconnect.","PeriodicalId":115758,"journal":{"name":"2017 International Conference on High Performance Computing & Simulation (HPCS)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134017591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3