{"title":"Resilience at Extreme Scale and Connections with Other Domains","authors":"L. Bautista-Gomez","doi":"10.1109/IPDPS53621.2022.00058","DOIUrl":"https://doi.org/10.1109/IPDPS53621.2022.00058","url":null,"abstract":"","PeriodicalId":6805,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"36 10 1","pages":"537"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76875895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"12 Ways to Fool the Masses with Irreproducible Results","authors":"L. Barba","doi":"10.1109/IPDPS49936.2021.00050","DOIUrl":"https://doi.org/10.1109/IPDPS49936.2021.00050","url":null,"abstract":"","PeriodicalId":6805,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"99 1","pages":"422"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73845648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Tale of Two C's: Convergence and Composability","authors":"I. Altintas","doi":"10.1109/IPDPS49936.2021.00009","DOIUrl":"https://doi.org/10.1109/IPDPS49936.2021.00009","url":null,"abstract":"","PeriodicalId":6805,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"45 1","pages":"1"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73747903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Varity: Quantifying Floating-Point Variations in HPC Systems Through Randomized Testing","authors":"I. Laguna","doi":"10.1109/IPDPS47924.2020.00070","DOIUrl":"https://doi.org/10.1109/IPDPS47924.2020.00070","url":null,"abstract":"Floating-point arithmetic can be confusing and it is sometimes misunderstood by programmers. While numerical reproducibility is desirable in HPC, it is often unachievable due to the different ways compilers treat floating-point arithmetic and generate code around it. This reproducibility problem is exacerbated in heterogeneous HPC systems where code can be executed on different floating-point hardware, e.g., a host and a device architecture, producing in some situations different numerical results. We present VARITY, a tool to quantify floatingpoint variations in heterogeneous HPC systems. Our approach generates random test programs for multiple architectures (host and device) using the compilers that are available in the system. Using differential testing, it compares floating-point results and identifies unexpected variations in the program results. The results can guide programmers in choosing the compilers that produce the most similar results in a system, which is useful when numerical reproducibility is critical. By running 50,000 experiments with Varity on a system with IBM POWER9 CPUs, NVIDIA V100 GPUs, and four compilers (gcc, clang, xl, and nvcc), we identify and document several programs that produce significantly different results for a given input when different compilers or architectures are used, even when a similar optimization level is used everywhere.","PeriodicalId":6805,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"11 1","pages":"622-633"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82408676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning an Effective Charging Scheme for Mobile Devices","authors":"Tang Liu, Baijun Wu, Wenzheng Xu, Xianbo Cao, Jiangen Peng, Hongyi Wu","doi":"10.1109/IPDPS47924.2020.00030","DOIUrl":"https://doi.org/10.1109/IPDPS47924.2020.00030","url":null,"abstract":"Wireless charging has been demonstrated as a promising technology for prolonging device operational lifetimes in Wireless Rechargeable Networks (WRNs). To schedule a mobile charger to move along a predesigned trajectory to charge devices, most existing studies assume that the precise location information of devices is already known. Unfortunately, this assumption does not always hold in real mobile application, because the activities of vast majority of mobile devices carried by mobile agents appear dynamic and random. To the best of our knowledge, this is the first work to study how to wirelessly charge mobile devices with non-deterministic mobility. We aim to provide effective charging service to them, subject to the energy capacity of the mobile charger. Then, we formalize the effective charging problem as a charging reward maximization problem (CRMP), where the amount of reward obtained by charging a de-vice is inversely proportional to the residual lifetime of the device. To derive an effective charging heuristic, an algorithm based on Reinforcement Learning (RL) is proposed. The evaluation results show that the RL-based charging algorithm achieves excellent charging effectiveness. We further interpret the learned heuristic to gain deep and valuable insights into the design options.","PeriodicalId":6805,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"9 1","pages":"202-211"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87490637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Smartly Handling Renewable Energy Instability in Supporting A Cloud Datacenter","authors":"Jiechao Gao, Haoyu Wang, Haiying Shen","doi":"10.1109/IPDPS47924.2020.00084","DOIUrl":"https://doi.org/10.1109/IPDPS47924.2020.00084","url":null,"abstract":"The size and energy consumption of datacenters have been increasing significantly over the past years. As a result, datacenters’ increasing electricity monetary cost, energy consumption and energy harmful gas emissions have become a severe problem. Renewable energy supply is widely seen as a promising solution. However, the instability of renewable energy brings about a new challenge since insufficient energy supply may lead to job running interruptions or failures. Though previous works attempt to more accurately predict the amount of produced renewable energy, due to the instability of its influencing factors (e.g., wind, temperature), sufficient renewable energy supply cannot be always guaranteed. To handle this problem, in this paper, we propose allocating jobs with the same service-level-objective (SLO) level to the same physical machine (PM) group, and power each PM group with renewable energy generators that have probability no less than its SLO to produce the amount no less than its energy demand. It ensures that insufficient renewable energy supply will not lead to SLO violations. We use a deep learning technique to predict the probability of producing amount no less than each value of each renewable energy source and predict the energy demands of each PM area. We formulate an optimization problem: how to match renewable energy resources with different instabilities to different PM groups as energy supply in order to minimize the number of SLO violations (due to interruption from insufficient renewable energy supply), total energy monetary cost and total carbon emission. We then use reinforcement learning method and linear programming method to solve the optimization problem. The real trace driven experiments show that our method can achieve much lower SLO violations, total energy monetary cost and total carbon emission compared to other methods.","PeriodicalId":6805,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"12 1","pages":"769-778"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88642139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hang Cao, Liang Yuan, He Zhang, Baodong Wu, Shigang Li, Pengqi Lu, Yunquan Zhang, Yongjun Xu, Minghua Zhang
{"title":"A Highly Efficient Dynamical Core of Atmospheric General Circulation Model based on Leap-Format","authors":"Hang Cao, Liang Yuan, He Zhang, Baodong Wu, Shigang Li, Pengqi Lu, Yunquan Zhang, Yongjun Xu, Minghua Zhang","doi":"10.1109/IPDPS47924.2020.00020","DOIUrl":"https://doi.org/10.1109/IPDPS47924.2020.00020","url":null,"abstract":"The finite-difference dynamical core based on the equal-interval latitude-longitude mesh has been widely used for numerical simulations of the Atmospheric General Circulation Model (AGCM). Previous work utilizes different filtering schemes to alleviate the instability problem incurred by the unequal physical spacing at different latitudes, but they all incur high communication and computation overhead and become a scaling bottleneck. This paper proposes a new leap-format finite-difference computing scheme. It generalizes the usual finite-difference format with adaptive wider intervals and is able to maintain the computational stability in the grid updating. Therefore, the costly filtering scheme is eliminated. The new scheme is parallelized with a shifting communication method and implemented with fine communication optimizations based on a 3D decomposition. With the proposed leap-format computation scheme, the communication overhead of the AGCM is significantly reduced and good load balance is exhibited. The simulation results verify the correctness of the new leap-format scheme. The new scheme achieves the speed of 16.6 simulation-year-per-day (SYPD) and up to 3.3x speedup over the latest implementation.","PeriodicalId":6805,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"80 1","pages":"95-104"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89272063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ConMidbox: Consolidated Middleboxes Selection and Routing in SDN/NFV-Enabled Networks","authors":"Guiyan Liu, Songtao Guo, Pan Li, Liang Liu","doi":"10.1109/IPDPS47924.2020.00101","DOIUrl":"https://doi.org/10.1109/IPDPS47924.2020.00101","url":null,"abstract":"Software defined networking (SDN) and network function virtualization (NFV) can flexibly manage software middlebox based services, and the consolidated middlebox model is able to simplify traffic routing and reduce the number of routing rules in the SDN-enabled switches. However, different network functions in middleboxes may change the volume of processed traffics, thus high congestion may occur in specific bottleneck links if middlebox selection and traffic routing are not well jointly planned. Besides, in a statically switch-controller configured SDN, traffic dynamics will not only affect the link load in data plane, but also pose a challenge to controller load balancing. Therefore, it’s necessary to achieve better quality-of-service (QoS) performance in both control and data plane. This paper first formulates it as a joint traffic-aware consolidated middleboxes selection and routing (JTMSR) problem and proves its NP-hardness. Then, a two-phase RL_RFRD algorithm is designed to achieve the controller and link load balancing where the first phase is to redirect selected flows by applying wildcard rules and the second phase is to find fine-grained routing path by a rounding-based algorithm with bounded approximation factor. Finally, the extensive simulation results demonstrate that the proposed algorithm has near-optimal controller load balancing and link load balancing performance and reduces response time by about 2x-5x compared with other algorithms.","PeriodicalId":6805,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"9 1","pages":"946-955"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87225118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}