2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)最新文献

筛选
英文 中文
Informed Prefetching in I/O Bounded Distributed Deep Learning I/O边界分布式深度学习中的知情预取
2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2021-06-01 DOI: 10.1109/IPDPSW52791.2021.00127
X. Ruan, Haiquan Chen
{"title":"Informed Prefetching in I/O Bounded Distributed Deep Learning","authors":"X. Ruan, Haiquan Chen","doi":"10.1109/IPDPSW52791.2021.00127","DOIUrl":"https://doi.org/10.1109/IPDPSW52791.2021.00127","url":null,"abstract":"Deep learning research has been growing rapidly in the past decade for the significant performance improvement on GPUs. While the computing capability of current GPUs is tremendous, data pre-processing/loading becomes a potential bottleneck that incurs major training latency and adds overhead in both CPU and memory, especially when datasets are too large to fit in memory. When datasets are stripped on distributed file systems, access to a remote storage node may deteriorate I/O performance significantly due to network I/O latency in cloud. Moreover, some deep learning workloads may be assigned to remote GPU servers in Edge Computing which results in even higher network I/O latency. Therefore, it is desirable to provide efficient parallel and distributed prefetching solution which is able to reduce the I/O cost of data pre-processing before feeding the data into GPUs for training on distributed storage systems of Cloud or Edge. Although the current deep learning frameworks like PyTorch or TensorFlow offer multiprocessing data loading functionalities, their approaches come at the price of high computing resource usage and memory usage. In this paper, we presented a novel thread-level Informed Prefetching Data Loader framework, IPDL, in support of efficient data prefetching from remote storage nodes in distributed deep learning environments and possibly in Edge Computing. Compared to its counterparts in PyTorch, IPDL is able to provide accelerated I/O performance for data loading while consuming lower computing resource and memory space at the same time. Extensive experiments on both an individual server and a cluster computing system have shown the superiority of IPDL over the latest implementation of PyTorch.","PeriodicalId":170832,"journal":{"name":"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129727513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
CPRIC: Collaborative Parallelism for Randomized Incremental Constructions 随机增量结构的协同并行
2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2021-06-01 DOI: 10.1109/IPDPSW52791.2021.00081
Florian Fey, S. Gorlatch
{"title":"CPRIC: Collaborative Parallelism for Randomized Incremental Constructions","authors":"Florian Fey, S. Gorlatch","doi":"10.1109/IPDPSW52791.2021.00081","DOIUrl":"https://doi.org/10.1109/IPDPSW52791.2021.00081","url":null,"abstract":"Randomized algorithms often outperform their deterministic counterparts in terms of simplicity and efficiency. In this paper, we consider Randomized Incremental Constructions (RICs) that are very popular, in particular in combinatorial optimization and computational geometry. Our contribution is Collaborative Parallel RIC (CPRIC) –a novel approach to parallelizing RIC for modern parallel architectures like vector processors and GPUs. We show that our approach based on a work-stealing mechanism avoids the control-flow divergence of parallel threads, thus improving the performance of parallel implementation. Our extensive experiments on CPU and GPU demonstrate the advantages of our CPRIC approach that achieves an average speedup between 4× and 5× compared to the naively parallelized RIC.","PeriodicalId":170832,"journal":{"name":"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127610628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Measuring Cache Complexity Using Data Movement Distance (DMD) 使用数据移动距离(DMD)测量缓存复杂度
2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2021-06-01 DOI: 10.1109/IPDPSW52791.2021.00070
Donovan Snyder, C. Ding
{"title":"Measuring Cache Complexity Using Data Movement Distance (DMD)","authors":"Donovan Snyder, C. Ding","doi":"10.1109/IPDPSW52791.2021.00070","DOIUrl":"https://doi.org/10.1109/IPDPSW52791.2021.00070","url":null,"abstract":"Given the ubiquity of cache-based machines, it is important to analyze how well a program or an algorithm uses the cache. There is no widely accepted measure of cache complexity, yet the cache complexity is often more important to performance than the measures of time and space complexity. This paper presents Data Movement Distance (DMD) to measure the cost of cache complexity for algorithms, demonstrates its use, and discusses it as a measure of locality. Since processor speeds are getting ever faster, one of the main bottlenecks in modern computing is moving the needed data into and around the processor. DMD measures the efficiency of the algorithm in this sense and therefore may be a much-needed complement to the conventional analysis of computation complexity. In this paper, we give an overview of DMD and some basic results. These will be expanded upon in future work.","PeriodicalId":170832,"journal":{"name":"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124315761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Message from the EduPar-21 Workshop Chair 来自EduPar-21研讨会主席的信息
2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2021-06-01 DOI: 10.1109/ipdpsw52791.2021.00055
{"title":"Message from the EduPar-21 Workshop Chair","authors":"","doi":"10.1109/ipdpsw52791.2021.00055","DOIUrl":"https://doi.org/10.1109/ipdpsw52791.2021.00055","url":null,"abstract":"","PeriodicalId":170832,"journal":{"name":"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123912611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Facilitating Staging-based Unstructured Mesh Processing to Support Hybrid In-Situ Workflows 促进基于阶段的非结构化网格处理,以支持混合原位工作流
2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2021-06-01 DOI: 10.1109/IPDPSW52791.2021.00152
Zhe Wang, P. Subedi, Matthieu Dorier, Philip E. Davis, M. Parashar
{"title":"Facilitating Staging-based Unstructured Mesh Processing to Support Hybrid In-Situ Workflows","authors":"Zhe Wang, P. Subedi, Matthieu Dorier, Philip E. Davis, M. Parashar","doi":"10.1109/IPDPSW52791.2021.00152","DOIUrl":"https://doi.org/10.1109/IPDPSW52791.2021.00152","url":null,"abstract":"In-situ and in-transit processing alleviate the gap between the computing and I/O capabilities by scheduling data analytics close to the data source. Hybrid in-situ processing splits data analytics into two stages: the data processing that runs in-situ aims to extract regions of interest, which are then transferred to staging services for further in-transit analytics. To facilitate this type of hybrid in-situ processing, the data staging service needs to support complex intermediate data representations generated/consumed by the in-situ tasks. Unstructured (or irregular) mesh is one such derived data representation that is typically used and bridges simulation data and analytics. However, how staging services efficiently support unstructured mesh transfer and processing remains to be explored. This paper investigates design options for transferring and processing unstructured mesh data using staging services. Using polygonal mesh data as an example, we show that hybrid in-situ workflows with staging-based unstructured mesh processing can effectively support hybrid in-situ workflows, and can significantly decrease data movement overheads.","PeriodicalId":170832,"journal":{"name":"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131107660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Autonomous Load Balancing in Distributed Hash Tables Using Churn and the Sybil Attack 利用搅和Sybil攻击实现分布式哈希表的自主负载平衡
2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2021-06-01 DOI: 10.1109/IPDPSW52791.2021.00097
Andrew Rosen, Benjamin Levin, A. Bourgeois
{"title":"Autonomous Load Balancing in Distributed Hash Tables Using Churn and the Sybil Attack","authors":"Andrew Rosen, Benjamin Levin, A. Bourgeois","doi":"10.1109/IPDPSW52791.2021.00097","DOIUrl":"https://doi.org/10.1109/IPDPSW52791.2021.00097","url":null,"abstract":"Distributed Hash Tables (DHTs) are an integral foundation for a variety of modern internet applications. In previous work, we have shown that DHTs can also be used as a means of organizing a large number of workers to tackle large-scale computing problems in a fault tolerant context. Whether a DHT is being used for file access or distributing a large-scale computing job, a cryptographic hash function is used to assign keys for nodes and data. Ideally, these would be uniformly distributed across the available range, thus evenly distributing the nodes and tasks. However, this is rarely the case in practice and as a result, the workload can become highly unbalanced. To address this issue, there have been numerous methods proposed for load balancing DHTs, but often they are a centralized approach.In this paper, we present four methods to autonomously balance the load of DHTs: 1) induced churn; 2) random injection of Sybil Nodes; 3) neighbor injection; and 4) invitation of nodes with low workloads. Each approach is completely decentralized, requiring minimal overhead, with individual nodes making decisions based only upon local information. What makes our approach unique is that the strategies rely on using the inherent churn in a DHT or by a variation of the Sybil attack to balance the workload. We simulate the four strategies on a Chord DHT and show they significantly rebalance the workload in a DHT. The strategy of randomly injecting virtual \"Sybil\" nodes performed the best in terms of balance and speedup.","PeriodicalId":170832,"journal":{"name":"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131385506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Automatic Selection of Tensor Decomposition for Compressing Convolutional Neural Networks A Case Study on VGG-type Networks 压缩卷积神经网络中张量分解的自动选择——以vgg型网络为例
2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2021-06-01 DOI: 10.1109/IPDPSW52791.2021.00115
Chia-Chun Liang, Che-Rung Lee
{"title":"Automatic Selection of Tensor Decomposition for Compressing Convolutional Neural Networks A Case Study on VGG-type Networks","authors":"Chia-Chun Liang, Che-Rung Lee","doi":"10.1109/IPDPSW52791.2021.00115","DOIUrl":"https://doi.org/10.1109/IPDPSW52791.2021.00115","url":null,"abstract":"Tensor decomposition is one of the model reduction techniques for compressing deep neural networks. Existing methods use either Tucker decomposition (TD) or Canonical Polyadic decomposition (CPD) for model compression, but none of them tried to combine those two methods, owing to the complexity of choosing a proper decomposition method for each layer. In this paper, we adopted the automatic tuning technique to design an algorithm that can mix both tensor decomposition methods, called Mixed Tensor Decomposition (MTD). The goal is to achieve better compression ratio while keeping similar accuracy as the original models. We used VGG type networks for the case study since they are relatively heavy and computationally expensive. We first studied the relation of model accuracy and compression ratio for Tucker and CPD applying to convolution neural networks (CNN). Based on the studied results, we designed a strategy to select the most suitable decomposition method for each layer, and further fine-tunes the models to recover the accuracy. We have conducted experiments using VGG11 and VGG16 with CIFAR10 dataset, and compared MTD with other tensor decomposition algorithms. The results show that MTD can achieve compression ratio 32 × and 37 × for VGG11 and VGG16 respectively with less than 1% accuracy drops, which is much better than the state-of-the-art tensor decomposition algorithms for model compression.","PeriodicalId":170832,"journal":{"name":"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126945283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Evaluating I/O Acceleration Mechanisms of SX-Aurora TSUBASA SX-Aurora TSUBASA I/O加速机制的评估
2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2021-06-01 DOI: 10.1109/IPDPSW52791.2021.00113
Y. Sasaki, Ayumu Ishizuka, Mulya Agung, H. Takizawa
{"title":"Evaluating I/O Acceleration Mechanisms of SX-Aurora TSUBASA","authors":"Y. Sasaki, Ayumu Ishizuka, Mulya Agung, H. Takizawa","doi":"10.1109/IPDPSW52791.2021.00113","DOIUrl":"https://doi.org/10.1109/IPDPSW52791.2021.00113","url":null,"abstract":"In a heterogeneous computing system, different kinds of processors might need to be involved in the execution of a file I/O operation. Since NEC SX-Aurora TSUBASA is one such system, two I/O acceleration mechanisms are offered to reduce the data transfer overheads among the processors for a file I/O operation. This paper first investigates the effects of the two mechanisms on the I/O performance of SX-Aurora TSUBASA. Considering the results, proper use of the two mechanisms is discussed via a real-world application of flood damage estimation. These results clearly demonstrate the demand for auto-tuning, i.e., adaptively selecting either of the two mechanisms with considering application behaviors and system configuration.","PeriodicalId":170832,"journal":{"name":"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114573844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Scalable and Highly Available Multi-Objective Neural Architecture Search in Bare Metal Kubernetes Cluster 裸机Kubernetes集群中可扩展和高可用的多目标神经结构搜索
2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2021-06-01 DOI: 10.1109/IPDPSW52791.2021.00094
Andreas Klos, Marius Rosenbaum, W. Schiffmann
{"title":"Scalable and Highly Available Multi-Objective Neural Architecture Search in Bare Metal Kubernetes Cluster","authors":"Andreas Klos, Marius Rosenbaum, W. Schiffmann","doi":"10.1109/IPDPSW52791.2021.00094","DOIUrl":"https://doi.org/10.1109/IPDPSW52791.2021.00094","url":null,"abstract":"The interest in deep neural networks for solving computer vision task has dramatically increased. Due to the heavy influence of the neural networks architecture on its predictive accuracy, neural architecture search has gained much attention in recent years. This research area typically implies a high computational burden and thus, requires high scalability as well as availability to ensure no data loss or waist of computational power. Moreover, the thinking of developing applications has changed from monolithic once to microservices. Hence, we developed a highly scalable and available multi-objective neural architecture search and adopted to the modern thinking of developing application by subdividing an already existing, monolithic neural architecture search – based on a genetic algorithm – into microservices. Furthermore, we adopted the initial population creation by 1,000 mutations of each individual, extended the approach by inception layers, implemented it as island model to facilitate scalability and achieved on MNIST, Fashion-MNIST and CIFAR-10 dataset 99.75%, 94.35% and 89.90% test accuracy respectively. Besides, our model is strongly focused on high availability empowered by the deployment in our bare-metal Kubernetes cluster. Our results show that the introduced multi-objective neural architecture search can easily handle even the loss of nodes and proceed the algorithm within seconds on another node without any loss of results or the necessity of human interaction.","PeriodicalId":170832,"journal":{"name":"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122159980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Parallelization of GKV benchmark using OpenACC 基于OpenACC的GKV基准并行化
2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2021-06-01 DOI: 10.1109/IPDPSW52791.2021.00109
Makoto Morishita, S. Ohshima, T. Katagiri, Toru Nagai
{"title":"Parallelization of GKV benchmark using OpenACC","authors":"Makoto Morishita, S. Ohshima, T. Katagiri, Toru Nagai","doi":"10.1109/IPDPSW52791.2021.00109","DOIUrl":"https://doi.org/10.1109/IPDPSW52791.2021.00109","url":null,"abstract":"The computing power of the Graphics Processing Unit (GPU) has received great attention in recent years, as 140 supercomputers with NVIDIA GPUs were ranked in the TOP500 for November 2020 [1]. However, CUDA, which is widely used in GPU programming, needs to be written at a low level and often requires the specialized knowledge of the GPU memory hierarchy and execution models. In this study, we used OpenACC [2], which semi-automatically generates kernel code by inserting directives into a program to speed up the application. The target application was benchmark program based on the plasma turbulence analysis code, gyrokinetic Vlasov code (GKV). With our implementation of OpenACC, kernel2, kernel3, and kernel4 of the benchmark were 31.43, 7.08, and 10.74 times faster, respectively, compared to CPU sequential execution. Thus, we succeeded in increasing the applications’ speed. In the future, we will port the rest of the code to the GPU environment to run the entire GKV on GPUs.","PeriodicalId":170832,"journal":{"name":"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115408498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信