A parallel strategy to accelerate neighborhood operation for raster data coordinating CPU and GPU

IF 2.6 3区 地球科学 Q1 GEOGRAPHY
Zhixin Yu, Chen Zhou, Manchun Li
{"title":"A parallel strategy to accelerate neighborhood operation for raster data coordinating CPU and GPU","authors":"Zhixin Yu, Chen Zhou, Manchun Li","doi":"10.1080/15230406.2023.2272660","DOIUrl":null,"url":null,"abstract":"ABSTRACTThis study presents an asynchronous parallel strategy coordinating central processing unit (CPU) and graphic processing unit (GPU) to accelerate neighborhood operation (NO). Specifically, we propose a data partitioning method called multi-anchor task queuing and a task scheduling method called bi-direction task scheduling, which can support CPU and GPU to find the responsible data blocks rapidly and concurrently handle their tasks via a bi-direction merge. Moreover, we optimize the organization of threads distributed among the CPU and GPU. Experimental results show that when a 1.7 GB raster dataset is processed, the speedup ratio achieved by the proposed parallel algorithm reaches 29.63, which is 19% and 18% higher than those of the GPU and standard asynchronous parallel algorithm, respectively. Additionally, the load balance index is below 0.085, which is significantly better than the value achieved by a conventional algorithm. Thus, the strategy achieves a higher speedup ratio and more adaptable load balance, thereby accelerating the NO more efficiently. Further, the impacts of the data volume, computational intensity, organization mode of the GPU threads, and granularity of the GPU stream on the parallel efficiency are evaluated and discussed. We also test the efficiency of four other common NOs with our strategy.KEYWORDS: Geographical raster dataneighborhood operationparallel computingCPU and GPUload balance AcknowledgmentsThe authors sincerely thank the anonymous reviewers and editors for their valuable feedback and constructive comments, which greatly contribute to improving this paper.Disclosure statementNo potential conflict of interest was reported by the author(s).CRediT authorship contribution statementZhixin Yu: Conceptualization, Methodology, Software, Visualization, Writing – original draft.Chen Zhou: Conceptualization, Data Curation, Supervision, Validation, Writing – review & editing.Manchun Li: Supervision, Writing – review & editing.Data availability statementThe computer code and sample dataset that support the findings of this study are available at https://www.doi.org/10.17605/OSF.IO/AG3QC. The code was developed using C++. A CPU with multiple cores and a CUDA-enabled GPU are necessary. It is recommended to run the code on OpenMP 2.0, CUDA 11.2 and GDAL 3.2.0 or later.Additional informationFundingThis work was supported by the National Natural Science Foundation of China [grant numbers 42271414 and 41901318].","PeriodicalId":47562,"journal":{"name":"Cartography and Geographic Information Science","volume":"5 5","pages":"0"},"PeriodicalIF":2.6000,"publicationDate":"2023-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cartography and Geographic Information Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/15230406.2023.2272660","RegionNum":3,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GEOGRAPHY","Score":null,"Total":0}
引用次数: 0

Abstract

ABSTRACTThis study presents an asynchronous parallel strategy coordinating central processing unit (CPU) and graphic processing unit (GPU) to accelerate neighborhood operation (NO). Specifically, we propose a data partitioning method called multi-anchor task queuing and a task scheduling method called bi-direction task scheduling, which can support CPU and GPU to find the responsible data blocks rapidly and concurrently handle their tasks via a bi-direction merge. Moreover, we optimize the organization of threads distributed among the CPU and GPU. Experimental results show that when a 1.7 GB raster dataset is processed, the speedup ratio achieved by the proposed parallel algorithm reaches 29.63, which is 19% and 18% higher than those of the GPU and standard asynchronous parallel algorithm, respectively. Additionally, the load balance index is below 0.085, which is significantly better than the value achieved by a conventional algorithm. Thus, the strategy achieves a higher speedup ratio and more adaptable load balance, thereby accelerating the NO more efficiently. Further, the impacts of the data volume, computational intensity, organization mode of the GPU threads, and granularity of the GPU stream on the parallel efficiency are evaluated and discussed. We also test the efficiency of four other common NOs with our strategy.KEYWORDS: Geographical raster dataneighborhood operationparallel computingCPU and GPUload balance AcknowledgmentsThe authors sincerely thank the anonymous reviewers and editors for their valuable feedback and constructive comments, which greatly contribute to improving this paper.Disclosure statementNo potential conflict of interest was reported by the author(s).CRediT authorship contribution statementZhixin Yu: Conceptualization, Methodology, Software, Visualization, Writing – original draft.Chen Zhou: Conceptualization, Data Curation, Supervision, Validation, Writing – review & editing.Manchun Li: Supervision, Writing – review & editing.Data availability statementThe computer code and sample dataset that support the findings of this study are available at https://www.doi.org/10.17605/OSF.IO/AG3QC. The code was developed using C++. A CPU with multiple cores and a CUDA-enabled GPU are necessary. It is recommended to run the code on OpenMP 2.0, CUDA 11.2 and GDAL 3.2.0 or later.Additional informationFundingThis work was supported by the National Natural Science Foundation of China [grant numbers 42271414 and 41901318].
一种加速栅格数据协同CPU和GPU邻域运算的并行策略
摘要提出了一种协调中央处理器(CPU)和图形处理器(GPU)的异步并行策略,以加速邻域运算(NO)。具体来说,我们提出了一种称为多锚任务队列的数据分区方法和一种称为双向任务调度的任务调度方法,它可以支持CPU和GPU快速找到负责的数据块,并通过双向合并并发处理它们的任务。此外,我们优化了分布在CPU和GPU之间的线程的组织。实验结果表明,在处理1.7 GB栅格数据集时,所提并行算法的加速比达到29.63,分别比GPU和标准异步并行算法提高19%和18%。负载均衡指数低于0.085,明显优于传统算法。因此,该策略实现了更高的加速比和更适应性的负载平衡,从而更有效地加速了NO。此外,还评估和讨论了数据量、计算强度、GPU线程的组织方式和GPU流粒度对并行效率的影响。我们还用我们的策略测试了其他四种常见no的效率。关键词:地理栅格数据邻域运算并行计算cpu和gpu负载平衡致谢作者衷心感谢匿名审稿人和编辑提供的宝贵意见和建设性意见,为本文的改进做出了巨大贡献。披露声明作者未报告潜在的利益冲突。余志新:概念化、方法论、软件、可视化、写作——原稿。陈周:概念化,数据管理,监督,验证,写作-审查和编辑。李曼春:监督、写作、评审、编辑。数据可用性声明支持本研究结果的计算机代码和样本数据集可在https://www.doi.org/10.17605/OSF.IO/AG3QC上获得。代码是用c++开发的。需要多核CPU和支持cuda的GPU。建议在OpenMP 2.0、CUDA 11.2和GDAL 3.2.0及以上版本上运行。本研究由国家自然科学基金资助[批准号:42271414和41901318]。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
5.20
自引率
20.00%
发文量
23
期刊介绍: Cartography and Geographic Information Science (CaGIS) is the official publication of the Cartography and Geographic Information Society (CaGIS), a member organization of the American Congress on Surveying and Mapping (ACSM). The Cartography and Geographic Information Society supports research, education, and practices that improve the understanding, creation, analysis, and use of maps and geographic information. The society serves as a forum for the exchange of original concepts, techniques, approaches, and experiences by those who design, implement, and use geospatial technologies through the publication of authoritative articles and international papers.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信