Ana Gainaru, Brice Goglin, Valentin Honoré, G. Aupy, P. Raghavan, Y. Robert, Hongyang Sun
{"title":"Reservation and Checkpointing Strategies for Stochastic Jobs","authors":"Ana Gainaru, Brice Goglin, Valentin Honoré, G. Aupy, P. Raghavan, Y. Robert, Hongyang Sun","doi":"10.1109/IPDPS47924.2020.00092","DOIUrl":"https://doi.org/10.1109/IPDPS47924.2020.00092","url":null,"abstract":"In this paper, we are interested in scheduling and checkpointing stochastic jobs on a reservation-based platform, whose cost depends both (i) on the reservation made, and (ii) on the actual execution time of the job. Stochastic jobs are jobs whose execution time cannot be determined easily. They arise from the heterogeneous, dynamic and data-intensive requirements of new emerging fields such as neuroscience. In this study, we assume that jobs can be interrupted at any time to take a checkpoint, and that job execution times follow a known probability distribution. Based on past experience, the user has to determine a sequence of fixed-length reservation requests, and to decide whether the state of the execution should be checkpointed at the end of each request. The objective is to minimize the expected cost of a successful execution of the jobs. We provide an optimal strategy for discrete probability distributions of job execution times, and we design fully polynomial-time approximation strategies for continuous distributions with bounded support. These strategies are then experimentally evaluated and compared to standard approaches such as periodic-length reservations and simple checkpointing strategies (either checkpoint all reservations, or none). The impact of an imprecise knowledge of checkpoint and restart costs is also assessed experimentally.","PeriodicalId":6805,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"22 1","pages":"853-863"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75089361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nicholas Buoncristiani, Sanjana Shah, D. Donofrio, J. Shalf
{"title":"Evaluating the Numerical Stability of Posit Arithmetic","authors":"Nicholas Buoncristiani, Sanjana Shah, D. Donofrio, J. Shalf","doi":"10.1109/IPDPS47924.2020.00069","DOIUrl":"https://doi.org/10.1109/IPDPS47924.2020.00069","url":null,"abstract":"The Posit number format has been proposed by John Gustafson as an alternative to the IEEE 754 standard floatingpoint format. Posits offer a unique form of tapered precision whereas IEEE floating-point numbers provide the same relative precision across most of their representational range. Posits are argued to have a variety of advantages including better numerical stability and simpler exception handling.The objective of this paper is to evaluate the numerical stability of Posits for solving linear systems where we evaluate Conjugate Gradient Method to demonstrate an iterative solver and Cholesky-Factorization to demonstrate a direct solver. We show that Posits do not consistently improve stability across a wide range of matrices, but we demonstrate that a simple rescaling of the underlying matrix improves convergence rates for Conjugate Gradient Method and reduces backward error for Cholesky Factorization. We also demonstrate that 16-bit Posit outperforms Float16 for mixed precision iterative refinement - especially when used in conjunction with a recently proposed matrix re-scaling strategy proposed by Nicholas Higham.","PeriodicalId":6805,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"1 1","pages":"612-621"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75744079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Implementation and Evaluation of a Hardware Decentralized Synchronization Lock for MPSoCs","authors":"M. France-Pillois, Jérôme Martin, F. Rousseau","doi":"10.1109/IPDPS47924.2020.00117","DOIUrl":"https://doi.org/10.1109/IPDPS47924.2020.00117","url":null,"abstract":"Each generation of shared memory Multi-Processor System-on-Chips (MPSoCs) tend to embed more and more computing units. The cores of modern MPSoCs are often grouped into clusters communicating with each other through Networks on Chip (NoCs). Having efficient scalable synchronization mechanisms is then mandatory to benefit from the high parallelism they offer.In this work we propose an innovative hardware support for synchronization locks. First of all, a non-intrusive measurement tool-chain allows us to prove a fundamental hypothesis as to optimization of the lock mechanism: although a lock may be used, at runtime, by various cores belonging to different clusters, it is often reused by the last core which has released it. Based on this observation, we provide a hardware decentralized solution to manage dynamic re-homing of locks in a dedicated memory, close to the latest access-granted core. This reduces overall access latency and network traffic in case of reuse of the lock within the same cluster.This paper presents our solution, called Lockality, and its performance evaluation on a characteristic MPSoC running on a hardware emulator. Experiments show large gains at low level (physical lock acquisition) as well as at the application level.","PeriodicalId":6805,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"26 1","pages":"1112-1121"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74445627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Michael R. Wyatt, Stephen Herbein, Kathleen Shoga, T. Gamblin, M. Taufer
{"title":"CanarIO: Sounding the Alarm on IO-Related Performance Degradation","authors":"Michael R. Wyatt, Stephen Herbein, Kathleen Shoga, T. Gamblin, M. Taufer","doi":"10.1109/IPDPS47924.2020.00018","DOIUrl":"https://doi.org/10.1109/IPDPS47924.2020.00018","url":null,"abstract":"Users interact with High Performance Computing (HPC) machines through batch systems, which take user job submissions and allocate them to computing resources. While some resource managers have a generalized resource model, in nearly all modern systems, nodes are the only resource managed. Other resources, such as parallel file systems, are also necessary for jobs to make progress, but schedulers are blind to these resources. Facility staff can manually detect critical problems and manually hold jobs that need particular file systems, but this requires manual monitoring. Without human intervention, modern schedulers will happily run jobs whose required resources are not available. As a result, resources are wasted when IO-intensive jobs are scheduled on file systems with degraded performance.We introduce CanarIO, a tool for predicting the IO-sensitivity of HPC jobs and detecting IO-related performance degradation on HPC systems. CanarIO uses a set of \"canary\" IO probes run at regular intervals on the system. Using performance measurements from these jobs, CanarIO builds classifiers that can determine which jobs are IO-sensitive and when file system performance is degraded. We demonstrate the accuracy of our tool with a simulation of system execution using real HPC data. Specifically, we detect 37.5% of IO degradation events and correctly identify >90% of IO-sensitive jobs. We show that with CanarIO predictions we recover >1,500 node-hours in 10 days, with a potential maximum of nearly 10,000 node-hours. CanarIO is the first step necessary for augmenting schedulers to be resource-aware.","PeriodicalId":6805,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"111 1","pages":"73-83"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81771597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Renping Liu, Xianzhang Chen, Yujuan Tan, Runyu Zhang, Liang Liang, Duo Liu
{"title":"SSDKeeper: Self-Adapting Channel Allocation to Improve the Performance of SSD Devices","authors":"Renping Liu, Xianzhang Chen, Yujuan Tan, Runyu Zhang, Liang Liang, Duo Liu","doi":"10.1109/IPDPS47924.2020.00103","DOIUrl":"https://doi.org/10.1109/IPDPS47924.2020.00103","url":null,"abstract":"Solid state drives (SSDs) have been widely deployed in high performance data center environments, where multiple tenants usually share the same hardware. However, traditional SSDs distribute the users’ incoming data uniformly across all SSD channels, which leads to numerous access conflicts. Meanwhile, SSDs that statically allocate one or several channels to one tenant sacrifice device parallelism and capacity. When SSDs are shared by tenants with different access patterns, inappropriate channel allocation results in SSDs performance degradation. In this paper, we propose a self-adapting channel allocation mechanism, named SSDKeeper, for multiple tenants to share one SSD. SSDKeeper employs a machine learning assisted algorithm to take full advantage of SSD parallelism while providing performance isolation. By collecting multi-tenant access patterns and training a model, SSDKeeper selects an optimal channel allocation strategy for multiple tenants with the lowest overall response latency. Experimental results show that SSDKeeper improves the overall performance by 24% with negligible overhead.","PeriodicalId":6805,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"1 1","pages":"966-975"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82015838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Busy-Time Scheduling on Heterogeneous Machines","authors":"Runtian Ren, Xueyan Tang","doi":"10.1109/IPDPS47924.2020.00040","DOIUrl":"https://doi.org/10.1109/IPDPS47924.2020.00040","url":null,"abstract":"We study a busy-time scheduling problem on heterogeneous machines (BSHM) which is motivated by server acquisition and task dispatching in cloud computing. The input of BSHM is a set of interval jobs, each specified by a size, an arrival time and a departure time. When a job arrives, it must be placed onto a machine immediately. The execution of a job cannot be interrupted until it departs. At any time, the total size of the jobs running on a machine cannot exceed the machine’s capacity. different types of machines are available and abundant machines are provided for each type. A type-i machine has a capacity gi and is charged at a cost rate ri when busy (running jobs). The target of BSHM is to schedule the given set of jobs onto machines with the minimum accumulated cost. Suppose the machine types are sorted by their capacities so that g1 ≤ g2 ≤? ≤ gm. We first consider two typical cases of BSHM. In BSHM-DEC,$frac{{{r_i}}}{{{g_i}}} geq frac{{{r_{i + 1}}}}{{{g_{i + 1}}}}$ holds for each i. In BSHM-INC, $frac{{{r_i}}}{{{g_i}}} leq frac{{{r_{i + 1}}}}{{{g_{i + 1}}}}$ holds for each i. For each case, we propose a O (1) approximation algorithm in the offline setting and a O(μ)-competitive algorithm in the non-clairvoyant online setting. Finally, we discuss how the scheduling strategies developed for these two cases can be combined to deal with the general BSHM problem.","PeriodicalId":6805,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"28 1","pages":"306-315"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82324259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rory Hector, R. Vaidyanathan, Gokarna Sharma, J. Trahan
{"title":"Optimal Convex Hull Formation on a Grid by Asynchronous Robots with Lights","authors":"Rory Hector, R. Vaidyanathan, Gokarna Sharma, J. Trahan","doi":"10.1109/IPDPS47924.2020.00111","DOIUrl":"https://doi.org/10.1109/IPDPS47924.2020.00111","url":null,"abstract":"We consider the distributed setting of n autonomous mobile robots that operate in Look-Compute-Move (LCM) cycles and communicate with other robots using a constant number of colored lights (the robots with lights model). We assume obstructed visibility where a robot cannot see another robot if a third robot is positioned between them on the straight line connecting them. In addition, we consider a grid-based terrain embedded in the 2-dimensional Euclidean plane that restricts each robot movement to one of the four neighboring grid points from its current position. This grid setting is a natural discretization of the 2-dimensional real plane and extends the robot swarm model in directions of greater applicability. The Convex Hull Formation problem is to relocate the n robots (starting at arbitrary, but distinct, initial positions) so that each robot is positioned on a vertex of a convex hull. In this paper, we provide two asynchronous algorithms for Convex Hull Formation, both using a constant number of colors. Key measures of the algorithms’ performance include the time taken and the space occupied (measured as the perimeter of the smallest rectangle enclosing the convex hull formed). The first O(max{n2, D})-time and O(n2)-perimeter algorithm serves to introduce key ideas, where D is the diameter of the initial configuration. The second algorithm runs in $Oleft( {max left{ {{n^{frac{3}{2}}},D} right}} right)$ time with a perimeter of $Oleft( {{n^{frac{3}{2}}}} right)$. We also prove lower bounds of $Omega left( {{n^{frac{3}{2}}}} right)$ for both the time and perimeter for any Convex Hull Formation algorithm; that is, we establish our second algorithm as optimal in both time and perimeter.","PeriodicalId":6805,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"2016 1","pages":"1051-1060"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78722732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. P. L. Carvalho, B. Honorio, A. Baldassin, G. Araújo
{"title":"Improving Transactional Code Generation via Variable Annotation and Barrier Elision","authors":"J. P. L. Carvalho, B. Honorio, A. Baldassin, G. Araújo","doi":"10.1109/IPDPS47924.2020.00107","DOIUrl":"https://doi.org/10.1109/IPDPS47924.2020.00107","url":null,"abstract":"With chip manufacturers such as Intel, IBM and ARM offering native support for transactional memory in their instruction set architectures, memory transactions are on the verge of being considered a genuine application tool rather than just an interesting research topic. Despite this recent increase in popularity on the hardware side of transactional memory (HTM), software support for transactional memory (STM) is still scarce and the only compiler with transactional support currently available, the GNU Compiler Collection (GCC), does not generate code that achieves desirable performance. This paper presents a detailed analysis of transactional code generated by GCC and by a proposed transactional memory support added to the Clang/LLVM compiler framework. Experimental results support the following contributions: (a) STM’s performance overhead is due to an excessive amount of read and write barriers added by the compiler; (b) a new annotation mechanism for the Clang/LLVM compiler framework that aims to overcome the barrier over-instrumentation problem by allowing programmers to specify which variables should be free from transactional instrumentation; (c) a profiling tool that ranks the most accessed memory locations at runtime, working as a guiding tool for programmers to annotate the code. Furthermore, it is revealed that, by correctly using the annotations on just a few lines of code, it is possible to reduce the total number of instrumented barriers by 95% and to achieve speed-ups of up to 7× when compared to the original code generated by GCC and the Clang compiler.","PeriodicalId":6805,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"1 1","pages":"1008-1017"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90110298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Message from the Steering Chair","authors":"V. Prasanna","doi":"10.1109/hipc.2015.62","DOIUrl":"https://doi.org/10.1109/hipc.2015.62","url":null,"abstract":"","PeriodicalId":6805,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"31 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88098453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}