H. Silva, C. Schepke, Natiele Lucca, C. Cristaldo, Dalmo Paim de Oliveira
{"title":"Parallel OpenMP and OpenACC Mixing Layer Simulation","authors":"H. Silva, C. Schepke, Natiele Lucca, C. Cristaldo, Dalmo Paim de Oliveira","doi":"10.1109/pdp55904.2022.00036","DOIUrl":"https://doi.org/10.1109/pdp55904.2022.00036","url":null,"abstract":"It is estimated that up to 25% of the grain crop ends up being lost in the post-harvest. The correct drying of the beans is one of the measures to contain this loss. As the grain mass is a set of solid and empty spaces, its drying could be considered a problem of the coupled open-porous medium. In this paper, a mathematical and computer simulation model was proposed, which describes the convection in a free flow with a porous obstacle applied to the drying of the grain. A computational fluid dynamics scheme was implemented in FORTRAN using Finite Volume to simulate and compute the numerical solutions. The code is parallel implemented using OpenMP and OpenACC programming interfaces. As a result, there was a significant reduction in processing time in both cases. The total simulation time was eight times less for a multicore architecture (16 physical cores) and 17.3 times using a single GPU (Quadro M5000).","PeriodicalId":210759,"journal":{"name":"2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134289632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Accelerating Distributed Deep Reinforcement Learning by In-Network Experience Sampling","authors":"Masaki Furukawa, Hiroki Matsutani","doi":"10.1109/pdp55904.2022.00020","DOIUrl":"https://doi.org/10.1109/pdp55904.2022.00020","url":null,"abstract":"A computing cluster that interconnects multiple compute nodes is used to accelerate distributed reinforcement learning based on DQN (Deep Q-Network). In distributed reinforcement learning, Actor nodes acquire experiences by interacting with a given environment and a Learner node optimizes their DQN model. Since data transfer between Actor and Learner nodes increases depending on the number of Actor nodes and their experience size, communication overhead between them is one of major performance bottlenecks. In this paper, their communication performance is optimized by using DPDK (Data Plane Development Kit). Specifically, DPDK-based low-latency experience replay memory server is deployed between Actor and Learner nodes interconnected with a 40GbE (40Gbit Ethernet) network. Evaluation results show that, as a network optimization technique, kernel bypassing by DPDK reduces network access latencies to a shared memory server by 32.7% to 58.9%. As another network optimization technique, an in-network experience replay memory server between Actor and Learner nodes reduces access latencies to the experience replay memory by 11.7% to 28.1% and communication latencies for prioritized experience sampling by 21.9% to 29.1%.","PeriodicalId":210759,"journal":{"name":"2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122317103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}