{"title":"一种加速栅格数据协同CPU和GPU邻域运算的并行策略","authors":"Zhixin Yu, Chen Zhou, Manchun Li","doi":"10.1080/15230406.2023.2272660","DOIUrl":null,"url":null,"abstract":"ABSTRACTThis study presents an asynchronous parallel strategy coordinating central processing unit (CPU) and graphic processing unit (GPU) to accelerate neighborhood operation (NO). Specifically, we propose a data partitioning method called multi-anchor task queuing and a task scheduling method called bi-direction task scheduling, which can support CPU and GPU to find the responsible data blocks rapidly and concurrently handle their tasks via a bi-direction merge. Moreover, we optimize the organization of threads distributed among the CPU and GPU. Experimental results show that when a 1.7 GB raster dataset is processed, the speedup ratio achieved by the proposed parallel algorithm reaches 29.63, which is 19% and 18% higher than those of the GPU and standard asynchronous parallel algorithm, respectively. Additionally, the load balance index is below 0.085, which is significantly better than the value achieved by a conventional algorithm. Thus, the strategy achieves a higher speedup ratio and more adaptable load balance, thereby accelerating the NO more efficiently. Further, the impacts of the data volume, computational intensity, organization mode of the GPU threads, and granularity of the GPU stream on the parallel efficiency are evaluated and discussed. We also test the efficiency of four other common NOs with our strategy.KEYWORDS: Geographical raster dataneighborhood operationparallel computingCPU and GPUload balance AcknowledgmentsThe authors sincerely thank the anonymous reviewers and editors for their valuable feedback and constructive comments, which greatly contribute to improving this paper.Disclosure statementNo potential conflict of interest was reported by the author(s).CRediT authorship contribution statementZhixin Yu: Conceptualization, Methodology, Software, Visualization, Writing – original draft.Chen Zhou: Conceptualization, Data Curation, Supervision, Validation, Writing – review & editing.Manchun Li: Supervision, Writing – review & editing.Data availability statementThe computer code and sample dataset that support the findings of this study are available at https://www.doi.org/10.17605/OSF.IO/AG3QC. The code was developed using C++. A CPU with multiple cores and a CUDA-enabled GPU are necessary. It is recommended to run the code on OpenMP 2.0, CUDA 11.2 and GDAL 3.2.0 or later.Additional informationFundingThis work was supported by the National Natural Science Foundation of China [grant numbers 42271414 and 41901318].","PeriodicalId":47562,"journal":{"name":"Cartography and Geographic Information Science","volume":"5 5","pages":"0"},"PeriodicalIF":2.6000,"publicationDate":"2023-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A parallel strategy to accelerate neighborhood operation for raster data coordinating CPU and GPU\",\"authors\":\"Zhixin Yu, Chen Zhou, Manchun Li\",\"doi\":\"10.1080/15230406.2023.2272660\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"ABSTRACTThis study presents an asynchronous parallel strategy coordinating central processing unit (CPU) and graphic processing unit (GPU) to accelerate neighborhood operation (NO). Specifically, we propose a data partitioning method called multi-anchor task queuing and a task scheduling method called bi-direction task scheduling, which can support CPU and GPU to find the responsible data blocks rapidly and concurrently handle their tasks via a bi-direction merge. Moreover, we optimize the organization of threads distributed among the CPU and GPU. Experimental results show that when a 1.7 GB raster dataset is processed, the speedup ratio achieved by the proposed parallel algorithm reaches 29.63, which is 19% and 18% higher than those of the GPU and standard asynchronous parallel algorithm, respectively. Additionally, the load balance index is below 0.085, which is significantly better than the value achieved by a conventional algorithm. Thus, the strategy achieves a higher speedup ratio and more adaptable load balance, thereby accelerating the NO more efficiently. Further, the impacts of the data volume, computational intensity, organization mode of the GPU threads, and granularity of the GPU stream on the parallel efficiency are evaluated and discussed. We also test the efficiency of four other common NOs with our strategy.KEYWORDS: Geographical raster dataneighborhood operationparallel computingCPU and GPUload balance AcknowledgmentsThe authors sincerely thank the anonymous reviewers and editors for their valuable feedback and constructive comments, which greatly contribute to improving this paper.Disclosure statementNo potential conflict of interest was reported by the author(s).CRediT authorship contribution statementZhixin Yu: Conceptualization, Methodology, Software, Visualization, Writing – original draft.Chen Zhou: Conceptualization, Data Curation, Supervision, Validation, Writing – review & editing.Manchun Li: Supervision, Writing – review & editing.Data availability statementThe computer code and sample dataset that support the findings of this study are available at https://www.doi.org/10.17605/OSF.IO/AG3QC. The code was developed using C++. A CPU with multiple cores and a CUDA-enabled GPU are necessary. It is recommended to run the code on OpenMP 2.0, CUDA 11.2 and GDAL 3.2.0 or later.Additional informationFundingThis work was supported by the National Natural Science Foundation of China [grant numbers 42271414 and 41901318].\",\"PeriodicalId\":47562,\"journal\":{\"name\":\"Cartography and Geographic Information Science\",\"volume\":\"5 5\",\"pages\":\"0\"},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2023-11-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Cartography and Geographic Information Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1080/15230406.2023.2272660\",\"RegionNum\":3,\"RegionCategory\":\"地球科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"GEOGRAPHY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cartography and Geographic Information Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/15230406.2023.2272660","RegionNum":3,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GEOGRAPHY","Score":null,"Total":0}
A parallel strategy to accelerate neighborhood operation for raster data coordinating CPU and GPU
ABSTRACTThis study presents an asynchronous parallel strategy coordinating central processing unit (CPU) and graphic processing unit (GPU) to accelerate neighborhood operation (NO). Specifically, we propose a data partitioning method called multi-anchor task queuing and a task scheduling method called bi-direction task scheduling, which can support CPU and GPU to find the responsible data blocks rapidly and concurrently handle their tasks via a bi-direction merge. Moreover, we optimize the organization of threads distributed among the CPU and GPU. Experimental results show that when a 1.7 GB raster dataset is processed, the speedup ratio achieved by the proposed parallel algorithm reaches 29.63, which is 19% and 18% higher than those of the GPU and standard asynchronous parallel algorithm, respectively. Additionally, the load balance index is below 0.085, which is significantly better than the value achieved by a conventional algorithm. Thus, the strategy achieves a higher speedup ratio and more adaptable load balance, thereby accelerating the NO more efficiently. Further, the impacts of the data volume, computational intensity, organization mode of the GPU threads, and granularity of the GPU stream on the parallel efficiency are evaluated and discussed. We also test the efficiency of four other common NOs with our strategy.KEYWORDS: Geographical raster dataneighborhood operationparallel computingCPU and GPUload balance AcknowledgmentsThe authors sincerely thank the anonymous reviewers and editors for their valuable feedback and constructive comments, which greatly contribute to improving this paper.Disclosure statementNo potential conflict of interest was reported by the author(s).CRediT authorship contribution statementZhixin Yu: Conceptualization, Methodology, Software, Visualization, Writing – original draft.Chen Zhou: Conceptualization, Data Curation, Supervision, Validation, Writing – review & editing.Manchun Li: Supervision, Writing – review & editing.Data availability statementThe computer code and sample dataset that support the findings of this study are available at https://www.doi.org/10.17605/OSF.IO/AG3QC. The code was developed using C++. A CPU with multiple cores and a CUDA-enabled GPU are necessary. It is recommended to run the code on OpenMP 2.0, CUDA 11.2 and GDAL 3.2.0 or later.Additional informationFundingThis work was supported by the National Natural Science Foundation of China [grant numbers 42271414 and 41901318].
期刊介绍:
Cartography and Geographic Information Science (CaGIS) is the official publication of the Cartography and Geographic Information Society (CaGIS), a member organization of the American Congress on Surveying and Mapping (ACSM). The Cartography and Geographic Information Society supports research, education, and practices that improve the understanding, creation, analysis, and use of maps and geographic information. The society serves as a forum for the exchange of original concepts, techniques, approaches, and experiences by those who design, implement, and use geospatial technologies through the publication of authoritative articles and international papers.