2023 31st Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)最新文献

筛选
英文 中文
Configurable synthetic application for studying malleability in HPC 可配置综合应用于高性能计算材料的延展性研究
Iker Martín-Álvarez, J. Aliaga, María Isabel Castillo, Sergio Iserte
{"title":"Configurable synthetic application for studying malleability in HPC","authors":"Iker Martín-Álvarez, J. Aliaga, María Isabel Castillo, Sergio Iserte","doi":"10.1109/PDP59025.2023.00027","DOIUrl":"https://doi.org/10.1109/PDP59025.2023.00027","url":null,"abstract":"Nowadays, the throughput improvement in large clusters of computers recommends the development of malleable applications. Thus, during the execution of these applications in a job, the resource management system (RMS) can modify its resource allocation, in order to increase the global throughput. There are different alternatives to complete the different steps in which the reallocation of resources is decomposed. To find the best alternatives, this paper introduces a configurable synthetic iterative MPI malleable application capable of modifying, in execution time, the number of MPI processes according to several parameters. The application includes a performance module to measure stages time within steps, from processes management to data redistribution. In this way, the analysis of different scenarios will allow to conclude how the reconfiguration of application has to be made in different circumstances. At the same time, this tool can be used to create workloads that will allow to analyse the impact of malleability on a system and the work in progress.","PeriodicalId":153500,"journal":{"name":"2023 31st Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132147880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sponsors and Supporters: PDP 2023 赞助商和支持者:PDP 2023
{"title":"Sponsors and Supporters: PDP 2023","authors":"","doi":"10.1109/pdp59025.2023.00009","DOIUrl":"https://doi.org/10.1109/pdp59025.2023.00009","url":null,"abstract":"","PeriodicalId":153500,"journal":{"name":"2023 31st Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)","volume":" 20","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114051089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Performance Analysis and Benchmarking of a Temperature Downscaling Deep Learning Model 温度降尺度深度学习模型的性能分析与基准测试
Karthick Panner Selvam, M. Brorsson
{"title":"Performance Analysis and Benchmarking of a Temperature Downscaling Deep Learning Model","authors":"Karthick Panner Selvam, M. Brorsson","doi":"10.1109/PDP59025.2023.00010","DOIUrl":"https://doi.org/10.1109/PDP59025.2023.00010","url":null,"abstract":"We are presenting here a detailed analysis and performance characterization of a statistical temperature downscaling application used in the MAELSTROM EuroHPC project. This application uses a deep learning methodology to convert low-resolution atmospheric temperature states into high-resolution. We have performed in-depth profiling and roofline analysis at different levels (Operators, Training, Distributed Training, Inference) of the downscaling model on different hardware architectures (Nvidia V100 & A100 GPUs). Finally, we compare the training and inference cost of the downscaling model with various cloud providers. Our results identify the model bottlenecks which can be used to enhance the model architecture and determine hardware configuration for efficiently utilizing the HPC. Furthermore, we provide a comprehensive methodology for in-depth profiling and benchmarking of the deep learning models.","PeriodicalId":153500,"journal":{"name":"2023 31st Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)","volume":"78 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114091060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Analyzing Data Reordering of a combined MPI and AVX execution of a Jacobi Method MPI和AVX联合执行Jacobi方法的数据重排序分析
T. Jakobs, Sebastian Kratzsch, G. Rünger
{"title":"Analyzing Data Reordering of a combined MPI and AVX execution of a Jacobi Method","authors":"T. Jakobs, Sebastian Kratzsch, G. Rünger","doi":"10.1109/PDP59025.2023.00032","DOIUrl":"https://doi.org/10.1109/PDP59025.2023.00032","url":null,"abstract":"The combination of different parallel programming environments can be used to exploit all heterogeneous levels of parallel hardware, which might lead to an optimization of application programs. An exemplary combination is the use of the Message Passing Interface (MPI) together with vectorization based on the Advanced vector extensions (AVX), which is investigated in this article. A special emphasis lies on MPI data orderings and their influence on AVX vectorization strategies. The Jacobi method is used as case study for which several parallel program version have been implemented and analyzed.","PeriodicalId":153500,"journal":{"name":"2023 31st Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123601947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluation of architecture-aware optimization techniques for Convolutional Neural Networks 卷积神经网络体系结构感知优化技术的评价
Raúl Marichal, Guillermo Toyos, Ernesto Dufrechu, P. Ezzatti
{"title":"Evaluation of architecture-aware optimization techniques for Convolutional Neural Networks","authors":"Raúl Marichal, Guillermo Toyos, Ernesto Dufrechu, P. Ezzatti","doi":"10.1109/PDP59025.2023.00036","DOIUrl":"https://doi.org/10.1109/PDP59025.2023.00036","url":null,"abstract":"The growing need to perform Neural network inference with low latency is giving place to a broad spectrum of heterogeneous devices with deep learning capabilities. Therefore, obtaining the best performance from each device and choosing the most suitable platform for a given problem has become challenging. This paper evaluates multiple inference platforms using architecture-aware optimizations for convolutional neural networks. Specifically, we use TensorRT and OpenVINO frameworks for hardware optimizations on top of the platform-aware NetAdapt algorithm. The experimental evaluation shows that on MobileNet and AlexNet, using NetAdapt with TensorRT or Open-VINO can improve latency up to 10 x and 5.3 x, respectively. Moreover, a throughput test using different batch sizes showed variable performance improvement on the different devices. Discussing the experimental results can guide the selection of devices and optimizations for different AI solutions.","PeriodicalId":153500,"journal":{"name":"2023 31st Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128888740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Convolutional graph neural network training scalability for molecular docking 卷积图神经网络训练分子对接的可扩展性
Kevin Crampon, Alexis Giorkallos, S. Baud, L. Steffenel
{"title":"Convolutional graph neural network training scalability for molecular docking","authors":"Kevin Crampon, Alexis Giorkallos, S. Baud, L. Steffenel","doi":"10.1109/PDP59025.2023.00042","DOIUrl":"https://doi.org/10.1109/PDP59025.2023.00042","url":null,"abstract":"Deep learning use is growing in many numerical simulation fields, and drug discovery does not escape this trend. Indeed, before proceeding with in vitro and then in vivo experiments, drug discovery now relies on in silico techniques such as molecular docking to narrow the number of experiments and identify the best candidates. This method explores the receptor surface and the ligand's conformational space, providing numerous ligand-receptor poses. All these poses are then scored and ranked by a scoring function allowing to predict the best poses among all, then compare different ligands regarding a given receptor or different targets regarding a given ligand. Since the 2010s, numerous deep learning methods have been used to tackle this problem. Nowadays, there are two significant trends in deep learning for molecular docking: (i) the augmentation of available structural data and (ii) the use of a new kind of neural network: the graph convolutional neural networks (GCNs). In this paper, we propose the study of training scalability of a GCN-a molecular complex scoring function-on an increasing number of GPUs and with a variety of batch sizes. After a hyperparameter analysis, we achieve an 80% reduction in the training time, but this improvement sometimes involves a performance metrics degradation that the final users must ponder.","PeriodicalId":153500,"journal":{"name":"2023 31st Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115781809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Auto-Tuning Method for High-Bandwidth Low-Latency Approximate Interconnection Networks 高带宽低延迟近似互联网络的自调优方法
S. Hirasawa, M. Koibuchi
{"title":"An Auto-Tuning Method for High-Bandwidth Low-Latency Approximate Interconnection Networks","authors":"S. Hirasawa, M. Koibuchi","doi":"10.1109/PDP59025.2023.00011","DOIUrl":"https://doi.org/10.1109/PDP59025.2023.00011","url":null,"abstract":"Ahstract-The next-generation interconnection networks, such as 400 GbE specification, impose Forwarding Error Correction (FEC) operation, such as RS-FEC (544,514), to incoming packets at every switch. The significant FEC latency increases the end-to-end communication latency that degrades the application performance in parallel computers. To resolve the FEC latency problem, a prior work presented error-prone high-bandwidth low-latency networks that do not perform the FEC operation. They enable high-bandwidth approximate data transfer and low-bandwidth perfect data transfer to support various kinds of parallel applications subject to different levels of probability of bit-flip occurrence. As the number of approximate data transfers increases, the parallel applications can obtain a significant speedup of their execution at the expense of the moderate degraded quality of results (QoRs). However, it is difficult for users to identify whether each communication should be approximate or not, so as to obtain the shortest execution time with enough QoRs for a given parallel application. In this study, we apply an auto-tuning framework for approximate interconnection networks; it automatically identifies whether each communication should be approximate data transfer or not, by attempting thousands executions of a given parallel application. An auto-tuning attempts a large number of program executions by varying the possible communication parameters to find out the best execution configuration of the program. The multiple executions would generate different positions of bit flips on communication data that may provide different qualities of results even if the same parameters are taken. Although this uncertainty introduces difficulties in the optimization of the auto-tuning, many offline trials lead to a high probability of the program's success execution. Evaluation results show that high-performance MPI applications with our auto-tuning method result in 1.30 average performance improvement on error-prone high-performance approximate networks.","PeriodicalId":153500,"journal":{"name":"2023 31st Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123942594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Distributed training and inference of deep learning solar energy forecasting models 深度学习太阳能预测模型的分布式训练与推理
Javier Campoy, Ignacio-Iker Prado-Rujas, J. L. Risco-Martín, Katzalin Olcoz, M. S. Pérez
{"title":"Distributed training and inference of deep learning solar energy forecasting models","authors":"Javier Campoy, Ignacio-Iker Prado-Rujas, J. L. Risco-Martín, Katzalin Olcoz, M. S. Pérez","doi":"10.1109/PDP59025.2023.00035","DOIUrl":"https://doi.org/10.1109/PDP59025.2023.00035","url":null,"abstract":"Different accurate predictive models have been developed to forecast the amount of solar energy produced in a given area. These models are usually run in a centralized manner, considering irradiance inputs taken from a set of sensors that are deployed in that area. CAIDE is a framework that supports the deployment and analysis of solar plants following Model Based System Engineering (MBSE) and Internet of Things (IoT) methodologies. However, the current solution performs the training and inference phases of the solar energy forecasting models in a central way, not taking advantage of the distributed environment modeled by means of CAIDE. This work presents an extension of CAIDE that allows us to distribute the training and inference phases, obtaining performance improvements, and achieving a greater adaptation to the inherently distributed topology of the deployment of the sensors.","PeriodicalId":153500,"journal":{"name":"2023 31st Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122064636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Tamper-Resistant Storage Framework for Smart Grid security 面向智能电网安全的防篡改存储框架
S. D'Antonio, Roberto Nardone, Nicola Russo, Federica Uccello
{"title":"A Tamper-Resistant Storage Framework for Smart Grid security","authors":"S. D'Antonio, Roberto Nardone, Nicola Russo, Federica Uccello","doi":"10.1109/PDP59025.2023.00022","DOIUrl":"https://doi.org/10.1109/PDP59025.2023.00022","url":null,"abstract":"In the past few years, the energy sector has been among the most targeted by cyber-criminals. Due to the strong reliance of Critical Infrastructures on energy distribution, and the strategic value of such systems, the impact of intrusions and data breaches cannot be underestimated. In this scenario, data constitutes a critical asset to protect, especially as the latest technological development has led to interconnected intelligent systems, named smart grids. The consequences of data tampering, exposure or loss can range from disruption of essential services, to serious risks for environment, economy and people safety. Data provenance, as the documentation of the origin of data and the processes and methodology that led to it, can bring support when facing the aforementioned attacks. The present work aims to address security issues in the energy domain, by proposing the Advanced Tamper-Resistant Storage (ATRS), a novel framework for data provenance based on blockchain technology. The ATRS allows for the creation and storage of provenance records, whose reliability is ensured by the tamper-resistance feature enabled through the combination of blockchain and TLS-based communication. The framework, tailored and tested for the smart grid domain, can easily be customized for different critical use cases.","PeriodicalId":153500,"journal":{"name":"2023 31st Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127831747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Parallelizing Multipacting Simulation for the Design of Particle Accelerator Components 粒子加速器组件设计的并行多碰撞仿真
J. Galarza, J. Navaridas, J. A. Pascual, T. Romero, J. L. Muñoz, I. Bustinduy
{"title":"Parallelizing Multipacting Simulation for the Design of Particle Accelerator Components","authors":"J. Galarza, J. Navaridas, J. A. Pascual, T. Romero, J. L. Muñoz, I. Bustinduy","doi":"10.1109/PDP59025.2023.00030","DOIUrl":"https://doi.org/10.1109/PDP59025.2023.00030","url":null,"abstract":"Particle trajectory and collision simulation is a critical step of the design and construction of novel particle accelerator components. However it requires a huge computational effort which can slow down the design process. We started from a sequential simulation program which is used to study an event called “Multipacting”. Our work explains the physical problem that is simulated and the implications it can have on the behavior of the components. Then we analyze the original program's operation to find the best options for parallelization. We first developed a parallel version of the Multipacting simulation and were able to accelerate the execution up to ~ 35× with 48 or 56 cores. In the best cases, parallelization efficiency was maintained up to 16 cores (~ 95%) and the speed-up plateaus at around 40 to 48 cores. When this first parallelization effort was tried for multi-power simulations, we found that parallelism was severely limited with a maximum of 20× speed-up. For this reason, we introduced a new method to improve the parallelization efficiency for this second use case. This method uses a shared processor pool for all simulations of electrons (OnePool). OnePool improved scalability by pushing the speed-up to over 32×.","PeriodicalId":153500,"journal":{"name":"2023 31st Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117055415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信