2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum最新文献

筛选
英文 中文
Filesystem Aware Scalable I/O Framework for Data-Intensive Parallel Applications 面向数据密集型并行应用的文件系统感知可扩展I/O框架
Rengan Xu, M. Araya-Polo, B. Chapman
{"title":"Filesystem Aware Scalable I/O Framework for Data-Intensive Parallel Applications","authors":"Rengan Xu, M. Araya-Polo, B. Chapman","doi":"10.1109/IPDPSW.2013.196","DOIUrl":"https://doi.org/10.1109/IPDPSW.2013.196","url":null,"abstract":"The growing speed gap between CPU and memory makes I/O the main bottleneck of many industrial applications. Some applications need to perform I/O operations for very large volume of data frequently, which will harm the performance seriously. This work's motivation are geophysical applications used for oil and gas exploration. These applications process Terabyte size datasets in HPC facilities. The datasets represent subsurface models and field recorded data. In general term, these applications read as inputs and write as intermediate/final results huge amount of data, where the underlying algorithms implement seismic imaging techniques. The traditional sequential I/O, even when couple with advance storage systems, cannot complete all I/O operations for so large volumes of data in an acceptable time range. Parallel I/O is the general strategy to solve such problems. However, because of the dynamic property of many of these applications, each parallel process does not know the data size it needs to write until its computation is done, and it also cannot identify the position in the file to write. In order to write correctly and efficiently, communication and synchronization are required among all processes to fully exploit the parallel I/O paradigm. To tackle these issues, we use a dynamic load balancing framework that is general enough for most of these applications. And to reduce the expensive synchronization and communication overhead, we introduced a I/O node that only handles I/O request and let compute nodes perform I/O operations in parallel. By using both POSIX I/O and memory-mapping interfaces, the experiment indicates that our approach is scalable. For instance, with 16 processes, the bandwidth of parallel reading can reach the theoretical peak performance (2.5 GB/s) of the storage infrastructure. Also, the parallel writing can be up to 4.68x (speedup, POSIX I/O) and 7.23x (speedup, memory-mapping) more efficient than the serial I/O implementation. Since, most geophysical applications are I/O bounded, these results positively impact the overall performance of the application, and confirm the chosen strategy as path to follow.","PeriodicalId":234552,"journal":{"name":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122931832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
High-Performance RDMA-based Design of Hadoop MapReduce over InfiniBand 基于ib的Hadoop MapReduce高性能rdma设计
Md. Wasi-ur-Rahman, Nusrat S. Islam, Xiaoyi Lu, Jithin Jose, H. Subramoni, Hao Wang, D. Panda
{"title":"High-Performance RDMA-based Design of Hadoop MapReduce over InfiniBand","authors":"Md. Wasi-ur-Rahman, Nusrat S. Islam, Xiaoyi Lu, Jithin Jose, H. Subramoni, Hao Wang, D. Panda","doi":"10.1109/IPDPSW.2013.238","DOIUrl":"https://doi.org/10.1109/IPDPSW.2013.238","url":null,"abstract":"MapReduce is a very popular programming model used to handle large datasets in enterprise data centers and clouds. Although various implementations of MapReduce exist, Hadoop MapReduce is the most widely used in large data centers like Facebook, Yahoo! and Amazon due to its portability and fault tolerance. Network performance plays a key role in determining the performance of data intensive applications using Hadoop MapReduce as data required by the map and reduce processes can be distributed across the cluster. In this context, data center designers have been looking at high performance interconnects such as InfiniBand to enhance the performance of their Hadoop MapReduce based applications. However, achieving better performance through usage of high performance interconnects like InfiniBand is a significant task. It requires a careful redesign of communication framework inside MapReduce. Several assumptions made for current socket based communication in the current framework do not hold true for high performance interconnects. In this paper, we propose the design of an RDMA-based Hadoop MapReduce over InfiniBand and several design elements: data shuffle over InfiniBand, in-memory merge mechanism for the Reducer, and pre-fetch data for the Mapper. We perform our experiments on native InfiniBand using Remote Direct Memory Access (RDMA) and compare our results with that of Hadoop-A [1] and default Hadoop over different interconnects and protocols. For all these experiments, we perform network level parameter tuning and use optimum values for each Hadoop design. Our performance results show that, for a 100GB TeraSort running on an eight node cluster, we achieve a performance improvement of 32% over IP-over InfiniBand (IPoIB) and 21% over Hadoop-A. With multiple disks per node, this benefit rises up to 39% over IPoIB and 31% over Hadoop-A.","PeriodicalId":234552,"journal":{"name":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114593608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 74
A Scalable Implicit Solver for Phase Field Crystal Simulations 相场晶体模拟的可扩展隐式求解器
Chao Yang, Xiaobin Cai
{"title":"A Scalable Implicit Solver for Phase Field Crystal Simulations","authors":"Chao Yang, Xiaobin Cai","doi":"10.1109/IPDPSW.2013.37","DOIUrl":"https://doi.org/10.1109/IPDPSW.2013.37","url":null,"abstract":"The phase field crystal equation (PFC) is a popular model for simulating micro-structures in materials science and is very computationally expensive to solve. A highly scalable solver for PFC modeling is presented in this paper. The equation is discredited with a stabilized implicit finite difference method and the time step size is adaptively controlled to obtain physically meaningful solutions. The nonlinear system arising at each time step is solved by using a parallel Newton-Krylov-Schwarz algorithm. In order to achieve good performance, low-order homogeneous boundary conditions are imposed on the sub domain boundary in the Schwarz preconditioner. Experiments are carried out to exploit optimal choices of the preconditioner type, the sub domain solver and the overlap size. Numerical results are provided to show that the solver is scalable to thousands of processor cores.","PeriodicalId":234552,"journal":{"name":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":"276 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122168715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Fine-Grained Manipulation of FPGA Configuration for Incremental Design 增量设计中FPGA配置的细粒度操作
Wenwei Zha, P. Athanas
{"title":"Fine-Grained Manipulation of FPGA Configuration for Incremental Design","authors":"Wenwei Zha, P. Athanas","doi":"10.1109/IPDPSW.2013.199","DOIUrl":"https://doi.org/10.1109/IPDPSW.2013.199","url":null,"abstract":"This paper presents the technique of manipulating FPGA configuration in fine granularity to improve the efficiency of incremental design. The main contributions are achieving hardware autonomy and enhancing hardware development productivity, demonstrated by two categories of applications: implementing Autonomous Adaptive Systems and Fast System Progotyping. Vendor tools provide limited facilitation for these applications. For the first category, a system with a universal UART transmitter is demonstrated on the ML410 FPGA board. The BAUD rate generating circuit is autonomously modified in hardware to adapt to the requirement of a remote UART receiver. For the second category, fast module assembly for prototyping a GNU Radio system is demonstrated on the XUPV5-LX110T FPGA board. Its run-time is tens of times faster than that of the vendor tool. Moreover, to evaluate the quality of the proposed fine-grained manipulation, wire delay information is approximated through brute-force analysis. The delay estimation result achieves accuracy within 6% error as compared to that of the vendor tool's.","PeriodicalId":234552,"journal":{"name":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117098051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Inferring Large-Scale Computation Behavior via Trace Extrapolation 通过跟踪外推推断大规模计算行为
L. Carrington, M. Laurenzano, Ananta Tiwari
{"title":"Inferring Large-Scale Computation Behavior via Trace Extrapolation","authors":"L. Carrington, M. Laurenzano, Ananta Tiwari","doi":"10.1109/IPDPSW.2013.137","DOIUrl":"https://doi.org/10.1109/IPDPSW.2013.137","url":null,"abstract":"Understanding large-scale application behavior is critical for effectively utilizing existing HPC resources and making design decisions for upcoming systems. In this work we present a methodology for characterizing an MPI application's large-scale computation behavior and system requirements using information about the behavior of that application at a series of smaller core counts. The methodology finds the best statistical fit from among a set of canonical functions in terms of how a set of features that are important for both performance and energy (cache hit rates, floating point intensity, ILP, etc.) change across a series of small core counts. The statistical models for each of these application features can then be utilized to generate an extrapolated trace of the application at scale. The fidelity of the fully extrapolated traces is evaluated by comparing the results of building performance models using both the extrapolated trace along with an actual trace in order to predict application performance at using each. For two full-scale HPC applications, SPECFEM3D and UH3D, the extrapolated traces had absolute relative errors of less than 5%.","PeriodicalId":234552,"journal":{"name":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126701316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Accelerating Dynamics Simulation of Solidification Processes of Liquid Metals Using GPU with CUDA 基于GPU和CUDA的液态金属凝固过程加速动力学仿真
Jie Liang, Kenli Li, Lin Shi, Yingqiang Liao
{"title":"Accelerating Dynamics Simulation of Solidification Processes of Liquid Metals Using GPU with CUDA","authors":"Jie Liang, Kenli Li, Lin Shi, Yingqiang Liao","doi":"10.1109/IPDPSW.2013.84","DOIUrl":"https://doi.org/10.1109/IPDPSW.2013.84","url":null,"abstract":"Molecular dynamics simulation is a powerful tool to simulate and analyze complex physical processes and phenomena at atomic characteristic for predicting the natural time-evolution of a system of atoms. Precise simulation of processes such as liquid metal solidification processes simulation has strong requirements both in the simulation size and computing timescale. Therefore, finding available computing resources is crucial to accelerate computation of solidification processes simulations. This paper presents a new approach to accelerate calculation of liquid metal solidification processes based on the previous study implemented on the CPU clusters, where the GPU-based MD (molecular dynamics) algorithm using a fine-grained spatial decomposition method enlarge the scale of the simulation system to a simulation system involving 10, 000, 000 atoms. The algorithms are implemented using FORTRAN and CUDA on a commodity NVIDIA Tesla M2050 card, where experimental results demonstrate that GPU-based calculations are typically 9~11 times faster than the corresponding sequential execution and approximately 1.5~2 times faster than 16-CPU clusters implementations.","PeriodicalId":234552,"journal":{"name":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126991539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Increasing the Scalability of PISM for High Resolution Ice Sheet Models 提高PISM在高分辨率冰盖模型中的可扩展性
P. Dickens, T. Morey
{"title":"Increasing the Scalability of PISM for High Resolution Ice Sheet Models","authors":"P. Dickens, T. Morey","doi":"10.1109/IPDPSW.2013.255","DOIUrl":"https://doi.org/10.1109/IPDPSW.2013.255","url":null,"abstract":"The issue of global climate change is of great interest to scientists and a critical concern of society at large. One important piece of the climate puzzle is how the dynamics of large-scale ice sheets, such as those in Greenland and Antarctica, will react in response to such climate change. Domain scientists have developed several simulation models to predict and understand the behavior of large-scale ice sheets, but the depth of knowledge gained from such models is largely dependent upon the resolution at which they can be efficiently executed. The problem, however, is that relatively small increases in the resolution of the model result in very large increases in the size of the input and output data sets, and an explosion in the number of grid points that must be considered by the simulation. Thus, increasing the resolution of ice-sheet models, in general, requires the use of supercomputing technologies and the application of tools and techniques developed within the high-performance computing research community. In this paper, we discuss our work in evaluating and increasing the performance of the Parallel Ice Sheet Model (PISM) [6, 25, 38], using a high-resolution model of the Greenland ice sheet, on a state-of-the-art supercomputer. In particular, we found that the computation performed by PISM was highly scalable, but that the I/O demands of the higher-resolution model were a significant drag on overall performance. We then performed a series of experiments to determine the cause of the relatively poor I/O performance and how such performance could be improved. By making simple changes to the PISM source code and one of the I/O libraries used by PISM we were able to provide an 8-fold increase in I/O performance.","PeriodicalId":234552,"journal":{"name":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123651569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Evaluating the Flexibility of Dynamic Loop Scheduling on Heterogeneous Systems in the Presence of Fluctuating Load Using SimGrid 基于SimGrid的负载波动情况下异构系统动态循环调度灵活性评估
Nitin Sukhija, I. Banicescu, Srishti Srivastava, F. Ciorba
{"title":"Evaluating the Flexibility of Dynamic Loop Scheduling on Heterogeneous Systems in the Presence of Fluctuating Load Using SimGrid","authors":"Nitin Sukhija, I. Banicescu, Srishti Srivastava, F. Ciorba","doi":"10.1109/IPDPSW.2013.132","DOIUrl":"https://doi.org/10.1109/IPDPSW.2013.132","url":null,"abstract":"Scientific applications running on heterogeneous computing systems, which often have unpredictable behavior, enhance their performance by employing loop scheduling techniques as methods to avoid load imbalance through an optimized assignment of their parallel loops. With current computing platforms facilitating petascale performance and promising exascale performance towards the end of the present decade, efficient and robust algorithms are required to guarantee optimal performance of parallel applications in the presence of unpredictable perturbations. A number of dynamic loop scheduling (DLS) methods based on probabilistic analyses have been developed to achieve the desired robust performance. In earlier work, two metrics (flexibility and resilience) have been formulated to quantify the robustness of various DLS methods in heterogeneous computing systems with uncertainties. In this work, to ensure robust performance of the scientific applications on current (petascale) and future(exascale) high performance computing systems, a simulation model was designed and integrated into the SimGrid simulation toolkit, thus enabling a comprehensive study of the robustness of the DLS methods which uses results of experimental cases with various combinations of number of processors, problem sizes, and scheduling methods. The DLS methods have been implemented into the simulation model and analyzed for the purpose of exploring their flexibility (robustness against unpredictable variations in the system load), when involved in a range of case scenarios comprised of various distributions characterizing loop iteration execution times and system availability. The simulation results reported are used to compare the robustness of the DLS methods under the various environments considered, using the flexibility metric.","PeriodicalId":234552,"journal":{"name":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114631064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Towards Dependability Testing of MapReduce Systems MapReduce系统可靠性测试研究
J. Marynowski
{"title":"Towards Dependability Testing of MapReduce Systems","authors":"J. Marynowski","doi":"10.1109/IPDPSW.2013.213","DOIUrl":"https://doi.org/10.1109/IPDPSW.2013.213","url":null,"abstract":"MapReduce systems have been widely used by several applications, from search tools to financial and commercial systems. There is considerable enthusiasm around these systems due to their simplicity and scalability. However, there is a lack of a testing approach, and a framework to ensure they are dependable. The goal of this PhD is to propose a complete dependability testing solution for MapReduce systems. This solution is a model-based approach for generating representative fault cases, and a testing framework to control their execution automatically. Initial experiments demonstrate promising results with HadoopTest framework coordinating fault cases across distributed MapReduce components and identifying faulty systems.","PeriodicalId":234552,"journal":{"name":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116500422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Teaching Parallel and Distributed Computing to Undergraduate Computer Science Students 计算机专业本科生并行与分布式计算教学
Marcelo Arroyo
{"title":"Teaching Parallel and Distributed Computing to Undergraduate Computer Science Students","authors":"Marcelo Arroyo","doi":"10.1109/IPDPSW.2013.276","DOIUrl":"https://doi.org/10.1109/IPDPSW.2013.276","url":null,"abstract":"Parallel and distributed systems programming skills has become a common requirement in the development of modern applications. It is imperative that any updated curriculum in computer science must include these topics not only as advanced (often elective) programming courses. There is a general consensus that parallel programming topics should be spread throughout the undergraduate curriculum.In this paper we describe how parallel and distributed computing and, specifically concurrent and parallel programming topics, are being included in the updated computer science curriculum of the degree in computer science at the Río Cuarto National University, Argentina. Also, we cover some suggested approaches for teaching parallel programming topics in a set of core courses to achieve a consistent, increasing and complete training in high performance computing. To achieve these goals, we propose a set of modules which includes basic and advanced high performance computing and some parallel and distributed systems programming topics, to be included in core courses. Finally, we describe the use of existing tools and the development of new high level tools, as parallel patterns, useful for teaching parallel programming which can be used in different courses. The aim of using these tools and techniques is to reduce the gap between sequential and parallel programming.","PeriodicalId":234552,"journal":{"name":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121471089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信