2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)最新文献

筛选
英文 中文
Message from the PDSEC-22 Workshop Chairs 来自PDSEC-22研讨会主席的信息
2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00137
Sabine Roller, P. Strazdins, R. Couturier, N. E. Pour, Suzanne Shontz, T. Rauber, G. Runger, L. Yang
{"title":"Message from the PDSEC-22 Workshop Chairs","authors":"Sabine Roller, P. Strazdins, R. Couturier, N. E. Pour, Suzanne Shontz, T. Rauber, G. Runger, L. Yang","doi":"10.1109/IPDPSW55747.2022.00137","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00137","url":null,"abstract":"Welcome to the 23nd IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC-22), held virtually on June 3rd, 2022 in Lyon, France, in conjunction with the 36th IEEE Inter-national Parallel and Distributed Processing Symposium (IPDPS 2022). This year, the workshop as IPDPS took place virtually due to the covid-19 pandemic.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116302356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Building scalable indexes that can be efficiently queried 构建可以有效查询的可伸缩索引
2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00034
C. Boucher
{"title":"Building scalable indexes that can be efficiently queried","authors":"C. Boucher","doi":"10.1109/IPDPSW55747.2022.00034","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00034","url":null,"abstract":"Recently, Gagie et al. proposed a version of the FM-index, called the r-index, that can store thousands of human genomes on a commodity computer. We later showed how to build the r-index efficiently via a technique called prefix-free parsing (PFP) and demonstrated its effectiveness for exact pattern matching. Exact pattern matching can be leveraged to support approximate pattern matching but the r-index itself cannot support efficiently popular and important queries such as finding maximal exact matches (MEMs). To address this shortcoming, Bannai et al. introduced the concept of thresholds, and showed that storing them together with the r-index enables efficient MEM finding --- but they did not say how to find those thresholds. We present another novel algorithm that applies PFP to build the r-index and find the thresholds simultaneously and in linear time and space with respect to the size of the prefix-free parse. Our implementation can rapidly find MEMs between reads and large sequence collections of highly repetitive sequences. Compared to existing methods, ours used 2 to 11 times less memory and was 2 to 32 times faster for index construction. Moreover, our method was less than one thousandth the size of competing indexes for large collections of human chromosomes.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"362 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114769898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
COMPOFF: A Compiler Cost model using Machine Learning to predict the Cost of OpenMP Offloading COMPOFF:一个使用机器学习来预测OpenMP卸载成本的编译器成本模型
2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00074
Alok Mishra, Smeet Chheda, Carlos Soto, A. Malik, Meifeng Lin, Barbara M. Chapman
{"title":"COMPOFF: A Compiler Cost model using Machine Learning to predict the Cost of OpenMP Offloading","authors":"Alok Mishra, Smeet Chheda, Carlos Soto, A. Malik, Meifeng Lin, Barbara M. Chapman","doi":"10.1109/IPDPSW55747.2022.00074","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00074","url":null,"abstract":"The HPC industry is inexorably moving towards an era of extremely heterogeneous architectures, with more devices configured on any given HPC platform and potentially more kinds of devices, some of them highly specialized. Writing a separate code suitable for each target system for a given HPC application is not practical. The better solution is to use directive-based parallel programming models such as OpenMP. OpenMP provides a number of options for offloading a piece of code to devices like GPUs. To select the best option from such options during compilation, most modern compilers use analytical models to estimate the cost of executing the original code and the different offloading code variants. Building such an analytical model for compilers is a difficult task that necessi-tates a lot of effort on the part of a compiler engineer. Recently, machine learning techniques have been successfully applied to build cost models for a variety of compiler optimization problems. In this paper, we present COMPOFF, a cost model that statically estimates the Cost of OpenMP OFFloading using a neural network model. We used six different transformations on a parallel code of Wilson Dslash Operator to support GPU offloading, and we predicted their cost of execution on different GPUs using COMPOFF during compile time. Our results show that this model can predict offloading costs with a root mean squared error in prediction of less than 0.5 seconds. Our preliminary findings indicate that this work will make it much easier and faster for scientists and compiler developers to port legacy HPC applications that use OpenMP to new heterogeneous computing environment.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134087049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Efficient Volume Estimation for Dynamic Environments using Deep Learning on the Edge 基于边缘深度学习的动态环境高效体积估计
2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00159
Chandan Kumar, Yamini Mathur, A. Jannesari
{"title":"Efficient Volume Estimation for Dynamic Environments using Deep Learning on the Edge","authors":"Chandan Kumar, Yamini Mathur, A. Jannesari","doi":"10.1109/IPDPSW55747.2022.00159","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00159","url":null,"abstract":"The utility of edge devices has increased in volume estimation of uneven terrains. Existing techniques utilize several geo-tagged images of the landscape, captured in-flight by an edge device mounted over a UAV, to generate 3D models and perform volume estimation through manual boundary marking. These methods, although accurate, require significant time, human effort and are heavily dependent on GPS. We present an efficient deep learning framework that detects the object of interest and automatically determines the volume (independent of GPS) of the detected object on-the-fly. Our method employs a stereo camera for depth sensing of the object and overlays a unit mesh grid over the object's boundary to perform volume estimation. We explore the accuracy vs computational complexity trade-off on variations of our technique. Experiments indicate that our method reduces the time for volume estimation by several orders of magnitude in contrast to existing methods and is independent of GPS as well. Also, to the best of our knowledge, this is the first method that can perform volume analysis in a dynamic environment.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116172310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
CGRA4HPC 2022 Invited Speaker: Practical, scalable, and easy-to-use CGRA for HPC CGRA4HPC 2022特邀演讲者:实用、可扩展、易于使用的HPC CGRA
2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00111
Ilan Tayari
{"title":"CGRA4HPC 2022 Invited Speaker: Practical, scalable, and easy-to-use CGRA for HPC","authors":"Ilan Tayari","doi":"10.1109/IPDPSW55747.2022.00111","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00111","url":null,"abstract":"NextSilicon has developed technology that allows using large CGRAs for acceleration of HPC applications and workloads, with zero code changes to parallel high-level language codes. Mr. Tayari will present this innovative technology, what it means for CGRA designs, and how it fits in the HPC market.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"185 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121837280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Performance Analysis of Multi-Containerized MD Simulations for Low-Level Resource Allocation 面向底层资源分配的多容器MD仿真性能分析
2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00162
Shingo Okuno, Akira Hirai, Naoto Fukumoto
{"title":"Performance Analysis of Multi-Containerized MD Simulations for Low-Level Resource Allocation","authors":"Shingo Okuno, Akira Hirai, Naoto Fukumoto","doi":"10.1109/IPDPSW55747.2022.00162","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00162","url":null,"abstract":"This study discusses scheduling strategies to maximize ensemble throughput, which is the total throughput of multiple containers running simultaneously. Such a strategy is useful, for example, in ensemble runs of molecular dynamics (MD) simulations. To design the strategies, we need to tackle two major challenges: (1) how many containers and how many threads per container we should allocate, and (2) which low-level resources we should allocate to reflect workload characteristics. In particular, the latter challenge is important and inevitable for performance-sensitive applications because they effectively utilize low-level hardware such as simultaneous multi-threading (SMT) to maximize performance, while most container platforms do not handle the challenge. In this paper, as a preliminary experiment to implement scheduling strategies related to SMT, we examined whether ensemble throughput of MD simulations can be improved by deploying containers on separate logical cores even when they share the same physical cores. As a result, we obtained a 2.22-fold ensemble throughput compared with a one-container execution with 10 physical cores.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"2 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125918663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
MultiGrid on FPGA Using Data Parallel C++ 基于数据并行c++的FPGA多网格
2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00147
C. Siefert, Stephen L. Olivier, G. Voskuilen, Jeffrey Young
{"title":"MultiGrid on FPGA Using Data Parallel C++","authors":"C. Siefert, Stephen L. Olivier, G. Voskuilen, Jeffrey Young","doi":"10.1109/IPDPSW55747.2022.00147","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00147","url":null,"abstract":"Centered on modern C++ and the SYCL standard for heterogeneous programming, Data Parallel C++ (dpc++) and Intel's oneAPI software ecosystem aim to lower the barrier to entry for the use of accelerators like FPGAs in diverse applications. In this work, we consider the usage of FPGAs for scientific computing, in particular with a multigrid solver, MueLu. We report on early experiences implementing kernels of the solver in DPC++ for execution on Stratix 10 FPGAs, and we evaluate several algorithmic design and implementation choices. These choices not only impact performance, but also shed light on the capabilities and limitations of DPC++ and oneAPI.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123494790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
17th IEEE International Workshop on Automatic Performance Tuning (iWAPT2022) 第17届IEEE自动性能调整国际研讨会(iWAPT2022)
2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00148
Che-Rung Lee, S. Ohshima
{"title":"17th IEEE International Workshop on Automatic Performance Tuning (iWAPT2022)","authors":"Che-Rung Lee, S. Ohshima","doi":"10.1109/IPDPSW55747.2022.00148","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00148","url":null,"abstract":"The goal of the Seventeenth International Workshop on Automatic Performance Tuning (iWAPT2022) is to bring together researchers who are investigating automated techniques for constructing and/or adapting algorithms and software for high-performance on modern complex machine architectures. iWAPT is a series of workshops that focus on research and techniques related to performance sustainability issues. The series provides an opportunity for researchers and users of automatic performance tuning (AT) technologies to exchange ideas and experiences acquired when applying such technologies to improve the performance of algorithms, libraries, and applications; in particular, on cutting edge computing platforms. The half-day workshops consist of presentations of research papers. Topics of interest include performance modeling; adaptive algorithms; autotuned numerical algorithms; libraries and scientific applications; empirical compilation; automated code generation; frameworks and theories of AT and software optimization; autonomic computing; and context-aware computing.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123779661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On How to Push Efficient Medical Semantic Segmentation to the Edge: the SENECA approach 如何将高效的医学语义分割推向边缘:SENECA方法
2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00027
Raffaele Berzoini, E. D’Arnese, Davide Conficconi
{"title":"On How to Push Efficient Medical Semantic Segmentation to the Edge: the SENECA approach","authors":"Raffaele Berzoini, E. D’Arnese, Davide Conficconi","doi":"10.1109/IPDPSW55747.2022.00027","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00027","url":null,"abstract":"Semantic segmentation is the process of assigning each input image pixel a value representing a class, and it enables the clustering of pixels into object instances. It is a highly employed computer vision task in various fields such as autonomous driving and medical image analysis. In particular, in medical practice, semantic segmentation identifies different regions of interest within an image, like different organs or anomalies such as tumors. Fully Convolutional Networks (FCNs) have been employed to solve semantic segmentation in different fields and found their way in the medical one. In this context, the low contrast among semantically different areas, the constraint related to energy consumption, and computation resource availability increase the complexity and limit their adoption in daily practice. Based on these considerations, we propose SENECA to bring medical semantic segmentation to the edge with high energy efficiency and low segmentation time while preserving the accuracy. We reached a throughput of 335.4 ± 0.34 frames per second on the FPGA, 4.65× better than its GPU counterpart, with a global dice score of 93.04% ± 0.07 and an improvement in terms of energy efficiency with respect to the GPU of 12.7×.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123836768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Litener: An Accelerator-Enabled Lightweight Container for Edge Computing Litener:用于边缘计算的支持加速器的轻量级容器
2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00158
Ryan Dyson, C. Reaño
{"title":"Litener: An Accelerator-Enabled Lightweight Container for Edge Computing","authors":"Ryan Dyson, C. Reaño","doi":"10.1109/IPDPSW55747.2022.00158","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00158","url":null,"abstract":"Containers supporting accelerators, such as Docker, include management tasks that require a significant amount of computing resources. While these resources are available in the Cloud, in other scenarios such as the Edge resources are more limited. An accelerator-enabled lightweight container would be desirable in such scenarios. In this paper, we analyse platforms to containerise applications using accelerators, including but not limited to Docker. After this analysis, we present a lightweight container, referred to as Litener, focused on using accelerators in scenarios with limited resources. Although the focus is on accelerators, many of the optimisations described can also be applied to scenarios that do not use accelerators. Experiments show a speedup of up to 7.78X when compared to other platforms.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130274837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信