Workshop Proceedings of the 51st International Conference on Parallel Processing最新文献

Towards a GPU accelerated selective sparsity multilayer perceptron algorithm using K-Nearest Neighbors search 基于k近邻搜索的GPU加速选择稀疏多层感知器算法

Workshop Proceedings of the 51st International Conference on Parallel Processing Pub Date : 2022-08-29 DOI: 10.1145/3547276.3548634

B. H. Meyer, Wagner M. Nunan Zola

{"title":"Towards a GPU accelerated selective sparsity multilayer perceptron algorithm using K-Nearest Neighbors search","authors":"B. H. Meyer, Wagner M. Nunan Zola","doi":"10.1145/3547276.3548634","DOIUrl":"https://doi.org/10.1145/3547276.3548634","url":null,"abstract":"The use of artificial neural networks and deep learning is common in several areas of knowledge. In many situations, it is necessary to use neural networks with many neurons. For example, the Extreme Classification problems can use neural networks that process more than 500,000 classes and inputs with more than 100,000 dimensions, which can make the training process unfeasible due to the high computational cost required. To overcome this limitation, several techniques were proposed in past works, such as the SLIDE algorithm, whose implementation is based on the construction of hash tables and on CPU parallelism. This work proposes the SLIDE-GPU, which replaces the use of hash tables by algorithms that use GPU to search for approximate neighbors, or approximate nearest neighbors (ANN) search. In addition, SLIDE-GPU also proposes the use of GPU to accelerate the activation step of neural networks. Among the experiments carried out, it was possible to notice a training process acceleration of up to 268% in execution time considering the inference accuracy, although currently maintaining the backpropagation phase with CPU processing. This suggests that further acceleration can be obtained in future work, by using massive parallelism in the entire process. The ANN-based technique provides better inference accuracy at each epoch, which helps producing the global acceleration, besides using the GPU in the neuron activation step. The GPU neuron activation acceleration reached a 28.09 times shorter execution time compared to the CPU implementation on this step alone.","PeriodicalId":255540,"journal":{"name":"Workshop Proceedings of the 51st International Conference on Parallel Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129570742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Efficient Support of the Scan Vector Model for RISC-V Vector Extension RISC-V矢量扩展中扫描矢量模型的有效支持

Workshop Proceedings of the 51st International Conference on Parallel Processing Pub Date : 2022-08-29 DOI: 10.1145/3547276.3548518

Hung-Ming Lai, Jenq-Kuen Lee

{"title":"Efficient Support of the Scan Vector Model for RISC-V Vector Extension","authors":"Hung-Ming Lai, Jenq-Kuen Lee","doi":"10.1145/3547276.3548518","DOIUrl":"https://doi.org/10.1145/3547276.3548518","url":null,"abstract":"RISC-V vector extension (RVV) provides wide vector registers, which is applicable for workloads with high data-level parallelism such as machine learning or cloud computing. However, it is not easy for developers to fully utilize the underlying performance of a new architecture. Hence, abstractions such as primitives or software frameworks could be employed to ease this burden. Scan, also known as all-prefix-sum, is a common building block for many parallel algorithms. Blelloch presented an algorithmic model called the scan vector model, which uses scan operations as primitives, and demonstrates that a broad range of applications and algorithms can be implemented by them. In our work, we present an efficient support of the scan vector model for RVV. With this support, parallel algorithms can be developed upon those primitives without knowing the details of RVV while gaining the performance that RVV provides. In addition, we provide an optimization scheme related to the length multiplier feature of RVV, which can further improve the utilization of the vector register files. The experiment shows that our support of scan and segmented scan for RVV can achieve 2.85x and 4.29x speedup, respectively, compared to the sequential implementation. With further optimization using the length multiplier of RVV, we can improve the previous result to 21.93x and 15.09x speedup.","PeriodicalId":255540,"journal":{"name":"Workshop Proceedings of the 51st International Conference on Parallel Processing","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127011867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

A Hybrid Data-flow Visual Programing Language* 一种混合数据流可视化编程语言*

Workshop Proceedings of the 51st International Conference on Parallel Processing Pub Date : 2022-08-29 DOI: 10.1145/3547276.3548525

Hongxin Wang, Qiuming Luo, Zheng Du

{"title":"A Hybrid Data-flow Visual Programing Language*","authors":"Hongxin Wang, Qiuming Luo, Zheng Du","doi":"10.1145/3547276.3548525","DOIUrl":"https://doi.org/10.1145/3547276.3548525","url":null,"abstract":"In this paper, we introduced a Hybrid Data-flow Visual Programing Language (HDVPL), which is an extended C/C++ language with a visual frontend and a dataflow runtime library. Although, most of the popular dataflow visual programming languages are designed for specialized purposes, HDVPL is for general-purpose programming. Unlike the others, the dataflow node behavior of HDVPL can be customized by programmer. Our intuitive visual interface can easily build a general-purpose dataflow program. It provides a visual editor to create nodes and connect them to form a DAG of dataflow task. This makes the beginner of computer programming capable of building parallel programs easily. With subgraph feature, complex hierarchical graphs can be built with container node. After the whole program is accomplished, the HDVPL can translate it into text-based source code and compile it into object file, which will be linked with HDVPL dataflow runtime library. To visualize dataflow programs in runtime, we integrated our dataflow library with frontend visual editor. The visual frontend will show the detailed information about the running program in console window.","PeriodicalId":255540,"journal":{"name":"Workshop Proceedings of the 51st International Conference on Parallel Processing","volume":"142 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133894347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Fast and Secure AKA Protocol for B5G B5G快速安全的AKA协议

Workshop Proceedings of the 51st International Conference on Parallel Processing Pub Date : 2022-08-29 DOI: 10.1145/3547276.3548440

Jung-Hsien Wu, Jie Yang, Yung-Chin Chang, Min-Te Sun

引用次数: 0

A User-Based Bike Return Algorithm for Docked Bike Sharing Systems 基于用户的共享单车返回算法

Workshop Proceedings of the 51st International Conference on Parallel Processing Pub Date : 2022-08-29 DOI: 10.1145/3547276.3548443

Donghui Chen, Kazuya Sakai

引用次数: 0

OpenMP Offloading in the Jetson Nano Platform Jetson Nano平台的OpenMP卸载

Workshop Proceedings of the 51st International Conference on Parallel Processing Pub Date : 2022-08-29 DOI: 10.1145/3547276.3548517

Ilias K. Kasmeridis, V. Dimakopoulos

引用次数: 0

Extracting High Definition Map Information from Aerial Images 从航拍图像中提取高清地图信息

Workshop Proceedings of the 51st International Conference on Parallel Processing Pub Date : 2022-08-29 DOI: 10.1145/3547276.3548442

Guan-Wen Chen, Hsueh-Yi Lai, Tsì-Uí İk

引用次数: 0

Runtime Techniques for Automatic Process Virtualization 自动过程虚拟化的运行时技术

Workshop Proceedings of the 51st International Conference on Parallel Processing Pub Date : 2022-08-29 DOI: 10.1145/3547276.3548522

Evan Ramos, Sam White, A. Bhosale, L. Kalé

{"title":"Runtime Techniques for Automatic Process Virtualization","authors":"Evan Ramos, Sam White, A. Bhosale, L. Kalé","doi":"10.1145/3547276.3548522","DOIUrl":"https://doi.org/10.1145/3547276.3548522","url":null,"abstract":"Asynchronous many-task runtimes look promising for the next generation of high performance computing systems. But these runtimes are usually based on new programming models, requiring extensive programmer effort to port existing applications to them. An alternative approach is to reimagine the execution model of widely used programming APIs, such as MPI, in order to execute them more asynchronously. Virtualization is a powerful technique that can be used to execute a bulk synchronous parallel program in an asynchronous manner. Moreover, if the virtualized entities can be migrated between address spaces, the runtime can optimize execution with dynamic load balancing, fault tolerance, and other adaptive techniques. Previous work on automating process virtualization has explored compiler approaches, source-to-source refactoring tools, and runtime methods. These approaches achieve virtualization with different tradeoffs in terms of portability (across different architectures, operating systems, compilers, and linkers), programmer effort required, and the ability to handle all different kinds of global state and programming languages. We implement support for three different related runtime methods, discuss shortcomings and their applicability to user-level virtualized process migration, and compare performance to existing approaches. Compared to existing approaches, one of our new methods achieves what we consider the best overall functionality in terms of portability, automation, support for migration, and runtime performance.","PeriodicalId":255540,"journal":{"name":"Workshop Proceedings of the 51st International Conference on Parallel Processing","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121200917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Training reinforcement learning models via an adversarial evolutionary algorithm 通过对抗进化算法训练强化学习模型

Workshop Proceedings of the 51st International Conference on Parallel Processing Pub Date : 2022-08-29 DOI: 10.1145/3547276.3548635

M. Coletti, Chathika Gunaratne, Catherine D. Schuman, Robert M. Patton

引用次数: 0

Pipelined Compression in Remote GPU Virtualization Systems using rCUDA: Early Experiences 使用rCUDA的远程GPU虚拟化系统中的流水线压缩:早期经验

Workshop Proceedings of the 51st International Conference on Parallel Processing Pub Date : 2022-08-29 DOI: 10.1145/3547276.3548628

Cristian Peñaranda Cebrián, C. Reaño, F. Silla

{"title":"Pipelined Compression in Remote GPU Virtualization Systems using rCUDA: Early Experiences","authors":"Cristian Peñaranda Cebrián, C. Reaño, F. Silla","doi":"10.1145/3547276.3548628","DOIUrl":"https://doi.org/10.1145/3547276.3548628","url":null,"abstract":"The amount of Internet of Things (IoT) devices has been increasing in the last years. These are usually low-performance devices with slow network connections. A common improvement is therefore to perform some computations at the edge of the network (e.g. preprocessing data), thereby reducing the amount of data sent through the network. To enhance the computing capabilities of edge devices, remote virtual Graphics Processing Units (GPUs) can be used. Thus, edge devices can leverage GPUs installed in remote computers. However, this solution requires exchanging data with the remote GPU across the network, which as mentioned is typically slow. In this paper we present a novel approach to improve communication performance of edge devices using rCUDA remote GPU virtualization framework. We implement within this framework on-the-fly pipelined data compression, which is done transparently to applications. We use four popular machine learning samples to carry out an initial performance exploration. The analysis is done using a slow 10 Mbps network to emulate the conditions of these devices. Early results show potential improvements provided some current issues are addressed.","PeriodicalId":255540,"journal":{"name":"Workshop Proceedings of the 51st International Conference on Parallel Processing","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134316450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0