Parallel Computing最新文献

筛选
英文 中文
Optimizing small channel 3D convolution on GPU with tensor core 基于张量核的GPU小通道三维卷积优化
IF 1.4 4区 计算机科学
Parallel Computing Pub Date : 2022-10-01 DOI: 10.1016/j.parco.2022.102954
Jiazhi Jiang, Dan Huang, Jiangsu Du, Yutong Lu, Xiangke Liao
{"title":"Optimizing small channel 3D convolution on GPU with tensor core","authors":"Jiazhi Jiang,&nbsp;Dan Huang,&nbsp;Jiangsu Du,&nbsp;Yutong Lu,&nbsp;Xiangke Liao","doi":"10.1016/j.parco.2022.102954","DOIUrl":"10.1016/j.parco.2022.102954","url":null,"abstract":"<div><p><span>In many scenarios, particularly scientific AI applications, algorithm engineers widely adopt more complex convolution, e.g. 3D </span>CNN<span>, to improve the accuracy. Scientific AI applications with 3D-CNN, which tends to train with volumetric datasets<span>, substantially increase the size of the input, which in turn potentially restricts the channel sizes (e.g. less than 64) under the constraints of limited device memory capacity. Since existing convolution implementations tend to split and parallelize computing the small channel convolution from channel dimension, they usually cannot fully exploit the performance of GPU accelerator, in particular that configured with the emerging tensor core.</span></span></p><p><span>In this work, we target on enhancing the performance of small channel 3D convolution on the GPU platform configured with tensor cores. Our analysis shows that the channel size of convolution has a great effect on the performance of existing convolution implementations, that are memory-bound on tensor core. By leveraging the memory hierarchy characteristics and the WMMA API of tensor core, we propose and implement holistic optimizations for both promoting the data access efficiency and intensifying the utilization of </span>computing units. Experiments show that our implementation can obtain 1.1x–5.4x speedup comparing to the cuDNN’s implementations for the 3D convolutions on different GPU platforms. We also evaluate our implementations on two practical scientific AI applications and observe up to 1.7x and 2.0x overall speedups compared with using cuDNN on V100 GPU.</p></div>","PeriodicalId":54642,"journal":{"name":"Parallel Computing","volume":"113 ","pages":"Article 102954"},"PeriodicalIF":1.4,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78348079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Graph optimization algorithm using symmetry and host bias for low-latency indirect network 基于对称和主机偏差的低延迟间接网络图优化算法
IF 1.4 4区 计算机科学
Parallel Computing Pub Date : 2022-10-01 DOI: 10.2139/ssrn.4048955
M. Nakao, M. Tsukamoto, Y. Hanada, Keiji Yamamoto
{"title":"Graph optimization algorithm using symmetry and host bias for low-latency indirect network","authors":"M. Nakao, M. Tsukamoto, Y. Hanada, Keiji Yamamoto","doi":"10.2139/ssrn.4048955","DOIUrl":"https://doi.org/10.2139/ssrn.4048955","url":null,"abstract":"","PeriodicalId":54642,"journal":{"name":"Parallel Computing","volume":"36 1","pages":"102983"},"PeriodicalIF":1.4,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90026890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A method for efficient radio astronomical data gridding on multi-core vector processor 一种基于多核矢量处理器的射电天文数据高效网格化方法
IF 1.4 4区 计算机科学
Parallel Computing Pub Date : 2022-10-01 DOI: 10.1016/j.parco.2022.102972
Hao Wang , Ce Yu , Jian Xiao , Shanjiang Tang , Yu Lu , Hao Fu , Bo Kang , Gang Zheng , Chenzhou Cui
{"title":"A method for efficient radio astronomical data gridding on multi-core vector processor","authors":"Hao Wang ,&nbsp;Ce Yu ,&nbsp;Jian Xiao ,&nbsp;Shanjiang Tang ,&nbsp;Yu Lu ,&nbsp;Hao Fu ,&nbsp;Bo Kang ,&nbsp;Gang Zheng ,&nbsp;Chenzhou Cui","doi":"10.1016/j.parco.2022.102972","DOIUrl":"10.1016/j.parco.2022.102972","url":null,"abstract":"<div><p><span><span>Gridding is the performance-critical step in the data reduction pipeline for radio astronomy research, allowing astronomers to create the correct sky images for further analysis. Like the 2D stencil computation, gridding iteratively updates the output cells by convolution, where the value at each output cell in the space is computed as a weighted sum of neighboring point values. Existing state-of-the-art works have achieved performance improvement of gridding by using multi-core CPUs and GPUs in real-world applications, and their study proved that gridding is a type of scientific computation with high-density computing characteristics. However, low computational performance or high </span>power consumption<span> becomes the main limitation for their processing of large-scale astronomical data. The high-density computing feature of gridding provides opportunities to accelerate it on the multi-core vector processor with vector-SIMD architectures. However, existing works’ (such as those implemented on CPUs or GPUs) task </span></span>parallelization<span> and data transfer strategies are inefficient to perform gridding directly on the vector processor without any dedicated mapping algorithm.</span></p><p>M-DSP is a multi-core vector processor with vector-SIMD architectures designed for the next-generation exascale supercomputer<span>, delivering high performance with ultra-low power consumption. In this paper, we present, for the first time, a novel method to achieve efficient gridding on the M-DSP. Specifically, we propose a gridding workflow designed for the vector-SIMD architectures and present a vectorized version<span> of the gridding convolution algorithm to fully exploit the computational power of the M-DSP. In addition, centering on the processor architectures, we propose task-based parallelization strategies for block and line computing as well as different data loading strategies to achieve high parallel performance and high data transfer efficiency. Experimental results show that our work on M-DSP exhibits very competitive performance compared to other methods running on CPUs or GPUs. This demonstrates the efficiency of our method and the fact that the vector-SIMD architecture is beneficial for scientific computing with ”high density” characteristics, which can exploit its wide vector core and achieve higher performance than its competitors.</span></span></p></div>","PeriodicalId":54642,"journal":{"name":"Parallel Computing","volume":"113 ","pages":"Article 102972"},"PeriodicalIF":1.4,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75782731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
QoS-aware dynamic resource allocation with improved utilization and energy efficiency on GPU 基于qos的动态资源分配,提高了GPU的利用率和能效
IF 1.4 4区 计算机科学
Parallel Computing Pub Date : 2022-10-01 DOI: 10.1016/j.parco.2022.102958
Qingxiao Sun , Liu Yi , Hailong Yang , Mingzhen Li , Zhongzhi Luan , Depei Qian
{"title":"QoS-aware dynamic resource allocation with improved utilization and energy efficiency on GPU","authors":"Qingxiao Sun ,&nbsp;Liu Yi ,&nbsp;Hailong Yang ,&nbsp;Mingzhen Li ,&nbsp;Zhongzhi Luan ,&nbsp;Depei Qian","doi":"10.1016/j.parco.2022.102958","DOIUrl":"10.1016/j.parco.2022.102958","url":null,"abstract":"<div><p><span><span><span>Although GPUs have been indispensable in </span>data centers, meeting the Quality of Service (QoS) under task consolidation on GPU is extremely challenging. Previous works mostly rely on the static task or resource scheduling and cannot handle the QoS violation during runtime. In addition, existing works fail to exploit the computing characteristics of batch tasks, and thus waste the opportunities to reduce </span>power consumption while improving GPU utilization. To address the above problems, we propose a new runtime mechanism </span><em>SMQoS</em> that can dynamically adjust the resource allocation during runtime to meet the QoS of latency-sensitive (LS) tasks and determine the optimal resource allocation for batch tasks to improve GPU utilization and power efficiency. We implement the proposed mechanism on both simulator (<em>SMQoS</em>) and real GPU hardware (<em>RH-SMQoS</em>). The experimental results show that both <em>SMQoS</em> and <em>RH-SMQoS</em><span> can achieve better QoS for LS tasks and higher throughput for batch tasks compared to the state-of-the-art works. With hardware extension, the </span><em>SMQoS</em> can further reduce the power consumption by power gating idle computing resources.</p></div>","PeriodicalId":54642,"journal":{"name":"Parallel Computing","volume":"113 ","pages":"Article 102958"},"PeriodicalIF":1.4,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75432812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
SVM-SMO-SGD: A hybrid-parallel support vector machine algorithm using sequential minimal optimization with stochastic gradient descent SVM-SMO-SGD:一种基于随机梯度下降的序列最小优化混合并行支持向量机算法
IF 1.4 4区 计算机科学
Parallel Computing Pub Date : 2022-10-01 DOI: 10.1016/j.parco.2022.102955
Gizen Mutlu, Çiğdem İnan Acı
{"title":"SVM-SMO-SGD: A hybrid-parallel support vector machine algorithm using sequential minimal optimization with stochastic gradient descent","authors":"Gizen Mutlu,&nbsp;Çiğdem İnan Acı","doi":"10.1016/j.parco.2022.102955","DOIUrl":"10.1016/j.parco.2022.102955","url":null,"abstract":"<div><p><span>The Support Vector Machine<span><span> (SVM) method is one of the popular machine learning algorithms<span> as it gives high accuracy. However, like most machine learning algorithms, the resource consumption of the SVM algorithm in terms of time and memory increases linearly as the dataset grows. In this study, a parallel-hybrid algorithm that combines SVM, Sequential Minimal Optimization (SMO) with Stochastic Gradient Descent (SGD) methods have been proposed to optimize the calculation of the weight costs. The performance of the proposed SVM-SMO-SGD algorithm was compared with classical SMO and Compute Unified Device Architecture (CUDA) based approaches on the well-known datasets (i.e., Diabetes, Healthcare Stroke Prediction, Adults) with 520, 5110, and 32,560 samples, respectively. According to the results, Sequential SVM-SMO-SGD is 3.81 times faster in terms of time, and 1.04 times more efficient </span></span>RAM consumption than the classical </span></span>SMO algorithm<span>. The parallel SVM-SMO-SGD algorithm, on the other hand, is 75.47 times faster than the classical SMO algorithm in terms of time. It is also 1.9 times more efficient in RAM consumption. The overall classification accuracy of all algorithms is 87% in the Diabetes dataset, 95% in the Healthcare Stroke Prediction dataset, and 82% in the Adults dataset.</span></p></div>","PeriodicalId":54642,"journal":{"name":"Parallel Computing","volume":"113 ","pages":"Article 102955"},"PeriodicalIF":1.4,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73437828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Routing brain traffic through the von Neumann bottleneck: Efficient cache usage in spiking neural network simulation code on general purpose computers 通过冯诺依曼瓶颈路由大脑流量:通用计算机上尖峰神经网络仿真代码的高效缓存使用
IF 1.4 4区 计算机科学
Parallel Computing Pub Date : 2022-10-01 DOI: 10.1016/j.parco.2022.102952
J. Pronold , J. Jordan , B.J.N. Wylie , I. Kitayama , M. Diesmann , S. Kunkel
{"title":"Routing brain traffic through the von Neumann bottleneck: Efficient cache usage in spiking neural network simulation code on general purpose computers","authors":"J. Pronold ,&nbsp;J. Jordan ,&nbsp;B.J.N. Wylie ,&nbsp;I. Kitayama ,&nbsp;M. Diesmann ,&nbsp;S. Kunkel","doi":"10.1016/j.parco.2022.102952","DOIUrl":"10.1016/j.parco.2022.102952","url":null,"abstract":"<div><p>Simulation is a third pillar next to experiment and theory in the study of complex dynamic systems such as biological neural networks. Contemporary brain-scale networks correspond to directed random graphs of a few million nodes, each with an in-degree and out-degree of several thousands of edges, where nodes and edges correspond to the fundamental biological units, neurons and synapses, respectively. The activity in neuronal networks is also sparse. Each neuron occasionally transmits a brief signal, called spike, via its outgoing synapses to the corresponding target neurons. In distributed computing these targets are scattered across thousands of parallel processes. The spatial and temporal sparsity represents an inherent bottleneck for simulations on conventional computers: irregular memory-access patterns cause poor cache utilization. Using an established neuronal network simulation code as a reference implementation, we investigate how common techniques to recover cache performance such as software-induced prefetching and software pipelining can benefit a real-world application. The algorithmic changes reduce simulation time by up to 50%. The study exemplifies that many-core systems assigned with an intrinsically parallel computational problem can alleviate the von Neumann bottleneck of conventional computer architectures.</p></div>","PeriodicalId":54642,"journal":{"name":"Parallel Computing","volume":"113 ","pages":"Article 102952"},"PeriodicalIF":1.4,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167819122000461/pdfft?md5=b8e7064aa5b20b2508d68e7bff9b38e4&pid=1-s2.0-S0167819122000461-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76371194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Fast calculation of isostatic compensation correction using the GPU-parallel prism method 用GPU平行棱镜法快速计算等静压补偿校正
IF 1.4 4区 计算机科学
Parallel Computing Pub Date : 2022-10-01 DOI: 10.1016/j.parco.2022.102970
Yan Huang , Qingbin Wang , Minghao Lv , Xingguang Song , Jinkai Feng , Xuli Tan , Ziyan Huang , Chuyuan Zhou
{"title":"Fast calculation of isostatic compensation correction using the GPU-parallel prism method","authors":"Yan Huang ,&nbsp;Qingbin Wang ,&nbsp;Minghao Lv ,&nbsp;Xingguang Song ,&nbsp;Jinkai Feng ,&nbsp;Xuli Tan ,&nbsp;Ziyan Huang ,&nbsp;Chuyuan Zhou","doi":"10.1016/j.parco.2022.102970","DOIUrl":"10.1016/j.parco.2022.102970","url":null,"abstract":"<div><p>Isostatic compensation is a crucial component of crustal structure analysis and geoid calculations in cases of gravity reduction. However, large-scale and high-precision calculations are limited by the inefficiencies of the strict prism method and the low accuracy of the approximate calculation formula. In this study, we propose a new method of terrain grid re-encoding and an eight-component strict prism integral disassembly using a compute unified device architecture parallel programming platform. We use a fast parallel algorithm for the isostatic compensation correction, using the strict prism method based on CPU + GPU heterogeneous parallelization with efficient task allocation and GPU thread overloading procedure. The results of this study provide a rigorous, fast, and accurate solution for high-resolution and high-precision isostatic compensation corrections. To ensure an absolute calculation accuracy of 10<sup>−6</sup> mGal, the maximum acceleration ratio of the calculation was set to at least 730 using one GPU and 2241 using four GPUs, which shortens the calculation time and improves the calculation efficiency.</p></div>","PeriodicalId":54642,"journal":{"name":"Parallel Computing","volume":"113 ","pages":"Article 102970"},"PeriodicalIF":1.4,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167819122000618/pdfft?md5=c2b82b5c153d0daba6ac23f42fb2b152&pid=1-s2.0-S0167819122000618-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45936763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
parGeMSLR: A parallel multilevel Schur complement low-rank preconditioning and solution package for general sparse matrices parGeMSLR:一般稀疏矩阵的并行多级Schur补低秩预处理和解包
IF 1.4 4区 计算机科学
Parallel Computing Pub Date : 2022-10-01 DOI: 10.1016/j.parco.2022.102956
Tianshi Xu , Vassilis Kalantzis , Ruipeng Li , Yuanzhe Xi , Geoffrey Dillon , Yousef Saad
{"title":"parGeMSLR: A parallel multilevel Schur complement low-rank preconditioning and solution package for general sparse matrices","authors":"Tianshi Xu ,&nbsp;Vassilis Kalantzis ,&nbsp;Ruipeng Li ,&nbsp;Yuanzhe Xi ,&nbsp;Geoffrey Dillon ,&nbsp;Yousef Saad","doi":"10.1016/j.parco.2022.102956","DOIUrl":"https://doi.org/10.1016/j.parco.2022.102956","url":null,"abstract":"<div><p>This paper discusses <span>parGeMSLR</span><span><span>, a C++/MPI software library for the solution of sparse systems of linear algebraic equations via preconditioned </span>Krylov subspace methods<span> in distributed-memory computing environments. The preconditioner implemented in </span></span><span>parGeMSLR</span><span> is based on algebraic domain decomposition and partitions the symmetrized adjacency graph recursively into several non-overlapping partitions via a </span><span><math><mi>p</mi></math></span>-way vertex separator, where <span><math><mi>p</mi></math></span><span> is an integer multiple of the total number of MPI processes. From a numerical perspective, </span><span>parGeMSLR</span><span><span> builds a Schur complement approximate inverse preconditioner as the sum between the </span>matrix inverse<span> of the interface coupling matrix and a low-rank correction term. To reduce the cost associated with the computation of the approximate inverse matrices, </span></span><span>parGeMSLR</span> exploits a multilevel partitioning of the algebraic domain. The <span>parGeMSLR</span> library is implemented on top of the Message Passing Interface and can solve both real and complex linear systems. Furthermore, <span>parGeMSLR</span><span> can take advantage of hybrid computing environments with in-node access to one or more Graphics Processing Units. Finally, the parallel efficiency (weak and strong scaling) of </span><span>parGeMSLR</span><span> is demonstrated on a few model problems arising from discretizations<span> of 3D Partial Differential Equations.</span></span></p></div>","PeriodicalId":54642,"journal":{"name":"Parallel Computing","volume":"113 ","pages":"Article 102956"},"PeriodicalIF":1.4,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91978783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Characterizing the Performance of Node-Aware Strategies for Irregular Point-to-Point Communication on Heterogeneous Architectures 异构体系结构中不规则点对点通信节点感知策略的性能表征
IF 1.4 4区 计算机科学
Parallel Computing Pub Date : 2022-09-13 DOI: 10.48550/arXiv.2209.06141
S. Lockhart, Amanda Bienz, W. Gropp, Luke N. Olson
{"title":"Characterizing the Performance of Node-Aware Strategies for Irregular Point-to-Point Communication on Heterogeneous Architectures","authors":"S. Lockhart, Amanda Bienz, W. Gropp, Luke N. Olson","doi":"10.48550/arXiv.2209.06141","DOIUrl":"https://doi.org/10.48550/arXiv.2209.06141","url":null,"abstract":"Supercomputer architectures are trending toward higher computational throughput due to the inclusion of heterogeneous compute nodes. These multi-GPU nodes increase on-node computational efficiency, while also increasing the amount of data to be communicated and the number of potential data flow paths. In this work, we characterize the performance of irregular point-to-point communication with MPI on heterogeneous compute environments through performance modeling, demonstrating the limitations of standard communication strategies for both device-aware and staging-through-host communication techniques. Presented models suggest staging communicated data through host processes then using node-aware communication strategies for high inter-node message counts. Notably, the models also predict that node-aware communication utilizing all available CPU cores to communicate inter-node data leads to the most performant strategy when communicating with a high number of nodes. Model validation is provided via a case study of irregular point-to-point communication patterns in distributed sparse matrix-vector products. Importantly, we include a discussion on the implications model predictions have on communication strategy design for emerging supercomputer architectures.","PeriodicalId":54642,"journal":{"name":"Parallel Computing","volume":"20 1","pages":"103021"},"PeriodicalIF":1.4,"publicationDate":"2022-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82207056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Energy-efficient scheduling algorithms based on task clustering in heterogeneous spark clusters 异构火花集群中基于任务聚类的节能调度算法
IF 1.4 4区 计算机科学
Parallel Computing Pub Date : 2022-09-01 DOI: 10.1016/j.parco.2022.102947
Wenhu Shi, Hongjian Li, Junzhe Guan, Hang Zeng, Rafe Misskat jahan
{"title":"Energy-efficient scheduling algorithms based on task clustering in heterogeneous spark clusters","authors":"Wenhu Shi,&nbsp;Hongjian Li,&nbsp;Junzhe Guan,&nbsp;Hang Zeng,&nbsp;Rafe Misskat jahan","doi":"10.1016/j.parco.2022.102947","DOIUrl":"10.1016/j.parco.2022.102947","url":null,"abstract":"<div><p><span>Spark is widely used for its fast in-memory processing. It is important to improve energy efficiency under deadline constrains. In this paper, a Task Performance Clustering of Best Fitting Decrease (TPCBFD) scheduling algorithm is proposed. It divides tasks in Spark into three types, with the different types of tasks being placed on nodes with superior performance. However, the basic computation time for TPCBFD takes up a large proportion of the task execution time, so the Energy-Aware TPCBFD (EATPCBFD) algorithm based on the proposed </span>energy consumption model<span> is proposed, focusing on optimizing energy efficiency and Service Level Agreement (SLA) service times. The experimental results show that EATPCBFD increases the average energy efficiency in Spark by 77% and the average passing rate of SLA service time by 14% compared to comparison algorithms. EATPCBFD has higher energy efficiency on average than comparison algorithms under deadline. The average energy efficiency of EATPCBFD with the deadline constraint is higher than the comparison algorithm.</span></p></div>","PeriodicalId":54642,"journal":{"name":"Parallel Computing","volume":"112 ","pages":"Article 102947"},"PeriodicalIF":1.4,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78038927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信