2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)最新文献_第7页

Informed Prefetching in I/O Bounded Distributed Deep Learning I/O边界分布式深度学习中的知情预取

2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2021-06-01 DOI: 10.1109/IPDPSW52791.2021.00127

X. Ruan, Haiquan Chen

{"title":"Informed Prefetching in I/O Bounded Distributed Deep Learning","authors":"X. Ruan, Haiquan Chen","doi":"10.1109/IPDPSW52791.2021.00127","DOIUrl":"https://doi.org/10.1109/IPDPSW52791.2021.00127","url":null,"abstract":"Deep learning research has been growing rapidly in the past decade for the significant performance improvement on GPUs. While the computing capability of current GPUs is tremendous, data pre-processing/loading becomes a potential bottleneck that incurs major training latency and adds overhead in both CPU and memory, especially when datasets are too large to fit in memory. When datasets are stripped on distributed file systems, access to a remote storage node may deteriorate I/O performance significantly due to network I/O latency in cloud. Moreover, some deep learning workloads may be assigned to remote GPU servers in Edge Computing which results in even higher network I/O latency. Therefore, it is desirable to provide efficient parallel and distributed prefetching solution which is able to reduce the I/O cost of data pre-processing before feeding the data into GPUs for training on distributed storage systems of Cloud or Edge. Although the current deep learning frameworks like PyTorch or TensorFlow offer multiprocessing data loading functionalities, their approaches come at the price of high computing resource usage and memory usage. In this paper, we presented a novel thread-level Informed Prefetching Data Loader framework, IPDL, in support of efficient data prefetching from remote storage nodes in distributed deep learning environments and possibly in Edge Computing. Compared to its counterparts in PyTorch, IPDL is able to provide accelerated I/O performance for data loading while consuming lower computing resource and memory space at the same time. Extensive experiments on both an individual server and a cluster computing system have shown the superiority of IPDL over the latest implementation of PyTorch.","PeriodicalId":170832,"journal":{"name":"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129727513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

CPRIC: Collaborative Parallelism for Randomized Incremental Constructions 随机增量结构的协同并行

2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2021-06-01 DOI: 10.1109/IPDPSW52791.2021.00081

Florian Fey, S. Gorlatch

引用次数: 0

Measuring Cache Complexity Using Data Movement Distance (DMD) 使用数据移动距离(DMD)测量缓存复杂度

2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2021-06-01 DOI: 10.1109/IPDPSW52791.2021.00070

Donovan Snyder, C. Ding

引用次数: 1

Message from the EduPar-21 Workshop Chair 来自EduPar-21研讨会主席的信息

2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2021-06-01 DOI: 10.1109/ipdpsw52791.2021.00055

引用次数: 0

Facilitating Staging-based Unstructured Mesh Processing to Support Hybrid In-Situ Workflows 促进基于阶段的非结构化网格处理，以支持混合原位工作流

2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2021-06-01 DOI: 10.1109/IPDPSW52791.2021.00152

Zhe Wang, P. Subedi, Matthieu Dorier, Philip E. Davis, M. Parashar

{"title":"Facilitating Staging-based Unstructured Mesh Processing to Support Hybrid In-Situ Workflows","authors":"Zhe Wang, P. Subedi, Matthieu Dorier, Philip E. Davis, M. Parashar","doi":"10.1109/IPDPSW52791.2021.00152","DOIUrl":"https://doi.org/10.1109/IPDPSW52791.2021.00152","url":null,"abstract":"In-situ and in-transit processing alleviate the gap between the computing and I/O capabilities by scheduling data analytics close to the data source. Hybrid in-situ processing splits data analytics into two stages: the data processing that runs in-situ aims to extract regions of interest, which are then transferred to staging services for further in-transit analytics. To facilitate this type of hybrid in-situ processing, the data staging service needs to support complex intermediate data representations generated/consumed by the in-situ tasks. Unstructured (or irregular) mesh is one such derived data representation that is typically used and bridges simulation data and analytics. However, how staging services efficiently support unstructured mesh transfer and processing remains to be explored. This paper investigates design options for transferring and processing unstructured mesh data using staging services. Using polygonal mesh data as an example, we show that hybrid in-situ workflows with staging-based unstructured mesh processing can effectively support hybrid in-situ workflows, and can significantly decrease data movement overheads.","PeriodicalId":170832,"journal":{"name":"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131107660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Autonomous Load Balancing in Distributed Hash Tables Using Churn and the Sybil Attack 利用搅和Sybil攻击实现分布式哈希表的自主负载平衡

2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2021-06-01 DOI: 10.1109/IPDPSW52791.2021.00097

Andrew Rosen, Benjamin Levin, A. Bourgeois

{"title":"Autonomous Load Balancing in Distributed Hash Tables Using Churn and the Sybil Attack","authors":"Andrew Rosen, Benjamin Levin, A. Bourgeois","doi":"10.1109/IPDPSW52791.2021.00097","DOIUrl":"https://doi.org/10.1109/IPDPSW52791.2021.00097","url":null,"abstract":"Distributed Hash Tables (DHTs) are an integral foundation for a variety of modern internet applications. In previous work, we have shown that DHTs can also be used as a means of organizing a large number of workers to tackle large-scale computing problems in a fault tolerant context. Whether a DHT is being used for file access or distributing a large-scale computing job, a cryptographic hash function is used to assign keys for nodes and data. Ideally, these would be uniformly distributed across the available range, thus evenly distributing the nodes and tasks. However, this is rarely the case in practice and as a result, the workload can become highly unbalanced. To address this issue, there have been numerous methods proposed for load balancing DHTs, but often they are a centralized approach.In this paper, we present four methods to autonomously balance the load of DHTs: 1) induced churn; 2) random injection of Sybil Nodes; 3) neighbor injection; and 4) invitation of nodes with low workloads. Each approach is completely decentralized, requiring minimal overhead, with individual nodes making decisions based only upon local information. What makes our approach unique is that the strategies rely on using the inherent churn in a DHT or by a variation of the Sybil attack to balance the workload. We simulate the four strategies on a Chord DHT and show they significantly rebalance the workload in a DHT. The strategy of randomly injecting virtual \"Sybil\" nodes performed the best in terms of balance and speedup.","PeriodicalId":170832,"journal":{"name":"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131385506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Automatic Selection of Tensor Decomposition for Compressing Convolutional Neural Networks A Case Study on VGG-type Networks 压缩卷积神经网络中张量分解的自动选择——以vgg型网络为例

2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2021-06-01 DOI: 10.1109/IPDPSW52791.2021.00115

Chia-Chun Liang, Che-Rung Lee

{"title":"Automatic Selection of Tensor Decomposition for Compressing Convolutional Neural Networks A Case Study on VGG-type Networks","authors":"Chia-Chun Liang, Che-Rung Lee","doi":"10.1109/IPDPSW52791.2021.00115","DOIUrl":"https://doi.org/10.1109/IPDPSW52791.2021.00115","url":null,"abstract":"Tensor decomposition is one of the model reduction techniques for compressing deep neural networks. Existing methods use either Tucker decomposition (TD) or Canonical Polyadic decomposition (CPD) for model compression, but none of them tried to combine those two methods, owing to the complexity of choosing a proper decomposition method for each layer. In this paper, we adopted the automatic tuning technique to design an algorithm that can mix both tensor decomposition methods, called Mixed Tensor Decomposition (MTD). The goal is to achieve better compression ratio while keeping similar accuracy as the original models. We used VGG type networks for the case study since they are relatively heavy and computationally expensive. We first studied the relation of model accuracy and compression ratio for Tucker and CPD applying to convolution neural networks (CNN). Based on the studied results, we designed a strategy to select the most suitable decomposition method for each layer, and further fine-tunes the models to recover the accuracy. We have conducted experiments using VGG11 and VGG16 with CIFAR10 dataset, and compared MTD with other tensor decomposition algorithms. The results show that MTD can achieve compression ratio 32 × and 37 × for VGG11 and VGG16 respectively with less than 1% accuracy drops, which is much better than the state-of-the-art tensor decomposition algorithms for model compression.","PeriodicalId":170832,"journal":{"name":"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126945283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Evaluating I/O Acceleration Mechanisms of SX-Aurora TSUBASA SX-Aurora TSUBASA I/O加速机制的评估

2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2021-06-01 DOI: 10.1109/IPDPSW52791.2021.00113

Y. Sasaki, Ayumu Ishizuka, Mulya Agung, H. Takizawa

引用次数: 1

Scalable and Highly Available Multi-Objective Neural Architecture Search in Bare Metal Kubernetes Cluster 裸机Kubernetes集群中可扩展和高可用的多目标神经结构搜索

2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2021-06-01 DOI: 10.1109/IPDPSW52791.2021.00094

Andreas Klos, Marius Rosenbaum, W. Schiffmann

{"title":"Scalable and Highly Available Multi-Objective Neural Architecture Search in Bare Metal Kubernetes Cluster","authors":"Andreas Klos, Marius Rosenbaum, W. Schiffmann","doi":"10.1109/IPDPSW52791.2021.00094","DOIUrl":"https://doi.org/10.1109/IPDPSW52791.2021.00094","url":null,"abstract":"The interest in deep neural networks for solving computer vision task has dramatically increased. Due to the heavy influence of the neural networks architecture on its predictive accuracy, neural architecture search has gained much attention in recent years. This research area typically implies a high computational burden and thus, requires high scalability as well as availability to ensure no data loss or waist of computational power. Moreover, the thinking of developing applications has changed from monolithic once to microservices. Hence, we developed a highly scalable and available multi-objective neural architecture search and adopted to the modern thinking of developing application by subdividing an already existing, monolithic neural architecture search – based on a genetic algorithm – into microservices. Furthermore, we adopted the initial population creation by 1,000 mutations of each individual, extended the approach by inception layers, implemented it as island model to facilitate scalability and achieved on MNIST, Fashion-MNIST and CIFAR-10 dataset 99.75%, 94.35% and 89.90% test accuracy respectively. Besides, our model is strongly focused on high availability empowered by the deployment in our bare-metal Kubernetes cluster. Our results show that the introduced multi-objective neural architecture search can easily handle even the loss of nodes and proceed the algorithm within seconds on another node without any loss of results or the necessity of human interaction.","PeriodicalId":170832,"journal":{"name":"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122159980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Parallelization of GKV benchmark using OpenACC 基于OpenACC的GKV基准并行化

2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2021-06-01 DOI: 10.1109/IPDPSW52791.2021.00109

Makoto Morishita, S. Ohshima, T. Katagiri, Toru Nagai

引用次数: 0