2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)最新文献_第7页

Message from the PDSEC-22 Workshop Chairs 来自PDSEC-22研讨会主席的信息

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00137

Sabine Roller, P. Strazdins, R. Couturier, N. E. Pour, Suzanne Shontz, T. Rauber, G. Runger, L. Yang

引用次数: 0

Building scalable indexes that can be efficiently queried 构建可以有效查询的可伸缩索引

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00034

C. Boucher

{"title":"Building scalable indexes that can be efficiently queried","authors":"C. Boucher","doi":"10.1109/IPDPSW55747.2022.00034","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00034","url":null,"abstract":"Recently, Gagie et al. proposed a version of the FM-index, called the r-index, that can store thousands of human genomes on a commodity computer. We later showed how to build the r-index efficiently via a technique called prefix-free parsing (PFP) and demonstrated its effectiveness for exact pattern matching. Exact pattern matching can be leveraged to support approximate pattern matching but the r-index itself cannot support efficiently popular and important queries such as finding maximal exact matches (MEMs). To address this shortcoming, Bannai et al. introduced the concept of thresholds, and showed that storing them together with the r-index enables efficient MEM finding --- but they did not say how to find those thresholds. We present another novel algorithm that applies PFP to build the r-index and find the thresholds simultaneously and in linear time and space with respect to the size of the prefix-free parse. Our implementation can rapidly find MEMs between reads and large sequence collections of highly repetitive sequences. Compared to existing methods, ours used 2 to 11 times less memory and was 2 to 32 times faster for index construction. Moreover, our method was less than one thousandth the size of competing indexes for large collections of human chromosomes.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"362 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114769898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

COMPOFF: A Compiler Cost model using Machine Learning to predict the Cost of OpenMP Offloading COMPOFF:一个使用机器学习来预测OpenMP卸载成本的编译器成本模型

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00074

Alok Mishra, Smeet Chheda, Carlos Soto, A. Malik, Meifeng Lin, Barbara M. Chapman

{"title":"COMPOFF: A Compiler Cost model using Machine Learning to predict the Cost of OpenMP Offloading","authors":"Alok Mishra, Smeet Chheda, Carlos Soto, A. Malik, Meifeng Lin, Barbara M. Chapman","doi":"10.1109/IPDPSW55747.2022.00074","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00074","url":null,"abstract":"The HPC industry is inexorably moving towards an era of extremely heterogeneous architectures, with more devices configured on any given HPC platform and potentially more kinds of devices, some of them highly specialized. Writing a separate code suitable for each target system for a given HPC application is not practical. The better solution is to use directive-based parallel programming models such as OpenMP. OpenMP provides a number of options for offloading a piece of code to devices like GPUs. To select the best option from such options during compilation, most modern compilers use analytical models to estimate the cost of executing the original code and the different offloading code variants. Building such an analytical model for compilers is a difficult task that necessi-tates a lot of effort on the part of a compiler engineer. Recently, machine learning techniques have been successfully applied to build cost models for a variety of compiler optimization problems. In this paper, we present COMPOFF, a cost model that statically estimates the Cost of OpenMP OFFloading using a neural network model. We used six different transformations on a parallel code of Wilson Dslash Operator to support GPU offloading, and we predicted their cost of execution on different GPUs using COMPOFF during compile time. Our results show that this model can predict offloading costs with a root mean squared error in prediction of less than 0.5 seconds. Our preliminary findings indicate that this work will make it much easier and faster for scientists and compiler developers to port legacy HPC applications that use OpenMP to new heterogeneous computing environment.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134087049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Efficient Volume Estimation for Dynamic Environments using Deep Learning on the Edge 基于边缘深度学习的动态环境高效体积估计

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00159

Chandan Kumar, Yamini Mathur, A. Jannesari

{"title":"Efficient Volume Estimation for Dynamic Environments using Deep Learning on the Edge","authors":"Chandan Kumar, Yamini Mathur, A. Jannesari","doi":"10.1109/IPDPSW55747.2022.00159","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00159","url":null,"abstract":"The utility of edge devices has increased in volume estimation of uneven terrains. Existing techniques utilize several geo-tagged images of the landscape, captured in-flight by an edge device mounted over a UAV, to generate 3D models and perform volume estimation through manual boundary marking. These methods, although accurate, require significant time, human effort and are heavily dependent on GPS. We present an efficient deep learning framework that detects the object of interest and automatically determines the volume (independent of GPS) of the detected object on-the-fly. Our method employs a stereo camera for depth sensing of the object and overlays a unit mesh grid over the object's boundary to perform volume estimation. We explore the accuracy vs computational complexity trade-off on variations of our technique. Experiments indicate that our method reduces the time for volume estimation by several orders of magnitude in contrast to existing methods and is independent of GPS as well. Also, to the best of our knowledge, this is the first method that can perform volume analysis in a dynamic environment.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116172310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

CGRA4HPC 2022 Invited Speaker: Practical, scalable, and easy-to-use CGRA for HPC CGRA4HPC 2022特邀演讲者:实用、可扩展、易于使用的HPC CGRA

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00111

Ilan Tayari

引用次数: 0

Performance Analysis of Multi-Containerized MD Simulations for Low-Level Resource Allocation 面向底层资源分配的多容器MD仿真性能分析

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00162

Shingo Okuno, Akira Hirai, Naoto Fukumoto

{"title":"Performance Analysis of Multi-Containerized MD Simulations for Low-Level Resource Allocation","authors":"Shingo Okuno, Akira Hirai, Naoto Fukumoto","doi":"10.1109/IPDPSW55747.2022.00162","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00162","url":null,"abstract":"This study discusses scheduling strategies to maximize ensemble throughput, which is the total throughput of multiple containers running simultaneously. Such a strategy is useful, for example, in ensemble runs of molecular dynamics (MD) simulations. To design the strategies, we need to tackle two major challenges: (1) how many containers and how many threads per container we should allocate, and (2) which low-level resources we should allocate to reflect workload characteristics. In particular, the latter challenge is important and inevitable for performance-sensitive applications because they effectively utilize low-level hardware such as simultaneous multi-threading (SMT) to maximize performance, while most container platforms do not handle the challenge. In this paper, as a preliminary experiment to implement scheduling strategies related to SMT, we examined whether ensemble throughput of MD simulations can be improved by deploying containers on separate logical cores even when they share the same physical cores. As a result, we obtained a 2.22-fold ensemble throughput compared with a one-container execution with 10 physical cores.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"2 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125918663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

MultiGrid on FPGA Using Data Parallel C++ 基于数据并行c++的FPGA多网格

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00147

C. Siefert, Stephen L. Olivier, G. Voskuilen, Jeffrey Young

引用次数: 3

17th IEEE International Workshop on Automatic Performance Tuning (iWAPT2022) 第17届IEEE自动性能调整国际研讨会(iWAPT2022)

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00148

Che-Rung Lee, S. Ohshima

引用次数: 0

On How to Push Efficient Medical Semantic Segmentation to the Edge: the SENECA approach 如何将高效的医学语义分割推向边缘:SENECA方法

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00027

Raffaele Berzoini, E. D’Arnese, Davide Conficconi

{"title":"On How to Push Efficient Medical Semantic Segmentation to the Edge: the SENECA approach","authors":"Raffaele Berzoini, E. D’Arnese, Davide Conficconi","doi":"10.1109/IPDPSW55747.2022.00027","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00027","url":null,"abstract":"Semantic segmentation is the process of assigning each input image pixel a value representing a class, and it enables the clustering of pixels into object instances. It is a highly employed computer vision task in various fields such as autonomous driving and medical image analysis. In particular, in medical practice, semantic segmentation identifies different regions of interest within an image, like different organs or anomalies such as tumors. Fully Convolutional Networks (FCNs) have been employed to solve semantic segmentation in different fields and found their way in the medical one. In this context, the low contrast among semantically different areas, the constraint related to energy consumption, and computation resource availability increase the complexity and limit their adoption in daily practice. Based on these considerations, we propose SENECA to bring medical semantic segmentation to the edge with high energy efficiency and low segmentation time while preserving the accuracy. We reached a throughput of 335.4 ± 0.34 frames per second on the FPGA, 4.65× better than its GPU counterpart, with a global dice score of 93.04% ± 0.07 and an improvement in terms of energy efficiency with respect to the GPU of 12.7×.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123836768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Litener: An Accelerator-Enabled Lightweight Container for Edge Computing Litener:用于边缘计算的支持加速器的轻量级容器

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00158

Ryan Dyson, C. Reaño

引用次数: 1