欧几里得距离图在多核处理器和gpu中的并行计算实现

Duhu Man, K. Uda, Hironobu Ueyama, Yasuaki Ito, K. Nakano
{"title":"欧几里得距离图在多核处理器和gpu中的并行计算实现","authors":"Duhu Man, K. Uda, Hironobu Ueyama, Yasuaki Ito, K. Nakano","doi":"10.1109/IC-NC.2010.55","DOIUrl":null,"url":null,"abstract":"Given a 2-D binary image of size $n \\times n$, Euclidean Distance Map (EDM) is a 2-D array of the same size such that each element is storing the Euclidean distance to the nearest black pixel. It is known that a sequential algorithm can compute the EDM in $O(n^2)$ and thus this algorithm is optimal. Also, work-time optimal parallel algorithms for shared memory model have been presented. However, these algorithms are too complicated to implement in existing shared memory parallel machines. The main contribution of this paper is to develop a simple parallel algorithm for the EDM and implement it in two parallel platforms: multicore processors and a Graphics Processing Unit (GPU). More specifically, we have implemented our parallel algorithm in a Linux server with four Intel hexad-core processors (Intel Xeon X7460 2.66GHz). We have also implemented it in a modern GPU system, Tesla C1060, respectively. The experimental results have shown that, for an input binary image with size of $10000\\times 10000$, our implementation in the multi-core system achieves a speedup factor of 18 over the performance of a sequential algorithm using a single processor in the same system. Meanwhile, for the same input binary image, our implementation on the GPU achieves a speedup factor of 5 over the sequential algorithm implementation.","PeriodicalId":375145,"journal":{"name":"2010 First International Conference on Networking and Computing","volume":"36 8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"24","resultStr":"{\"title\":\"Implementations of Parallel Computation of Euclidean Distance Map in Multicore Processors and GPUs\",\"authors\":\"Duhu Man, K. Uda, Hironobu Ueyama, Yasuaki Ito, K. Nakano\",\"doi\":\"10.1109/IC-NC.2010.55\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Given a 2-D binary image of size $n \\\\times n$, Euclidean Distance Map (EDM) is a 2-D array of the same size such that each element is storing the Euclidean distance to the nearest black pixel. It is known that a sequential algorithm can compute the EDM in $O(n^2)$ and thus this algorithm is optimal. Also, work-time optimal parallel algorithms for shared memory model have been presented. However, these algorithms are too complicated to implement in existing shared memory parallel machines. The main contribution of this paper is to develop a simple parallel algorithm for the EDM and implement it in two parallel platforms: multicore processors and a Graphics Processing Unit (GPU). More specifically, we have implemented our parallel algorithm in a Linux server with four Intel hexad-core processors (Intel Xeon X7460 2.66GHz). We have also implemented it in a modern GPU system, Tesla C1060, respectively. The experimental results have shown that, for an input binary image with size of $10000\\\\times 10000$, our implementation in the multi-core system achieves a speedup factor of 18 over the performance of a sequential algorithm using a single processor in the same system. Meanwhile, for the same input binary image, our implementation on the GPU achieves a speedup factor of 5 over the sequential algorithm implementation.\",\"PeriodicalId\":375145,\"journal\":{\"name\":\"2010 First International Conference on Networking and Computing\",\"volume\":\"36 8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-11-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"24\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 First International Conference on Networking and Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IC-NC.2010.55\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 First International Conference on Networking and Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IC-NC.2010.55","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 24

摘要

给定一个大小为$n \乘以n$的二维二值图像,欧几里得距离图(EDM)是一个相同大小的二维数组,使得每个元素都存储到最近的黑色像素的欧几里得距离。已知顺序算法可以在$O(n^2)$内计算EDM,因此该算法是最优的。在此基础上,提出了共享内存模型的工作时间优化并行算法。然而,这些算法过于复杂,无法在现有的共享内存并行机中实现。本文的主要贡献是为EDM开发了一个简单的并行算法,并在两个并行平台上实现:多核处理器和图形处理单元(GPU)。更具体地说,我们在具有四个Intel六核处理器(Intel Xeon X7460 2.66GHz)的Linux服务器上实现了并行算法。我们还分别在现代GPU系统Tesla C1060中实现了它。实验结果表明,对于大小为$10000\ × 10000$的输入二进制图像,我们在多核系统中的实现比在同一系统中使用单处理器的顺序算法的性能提高了18倍。同时,对于相同的输入二值图像,我们在GPU上的实现实现了比顺序算法实现5倍的加速系数。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Implementations of Parallel Computation of Euclidean Distance Map in Multicore Processors and GPUs
Given a 2-D binary image of size $n \times n$, Euclidean Distance Map (EDM) is a 2-D array of the same size such that each element is storing the Euclidean distance to the nearest black pixel. It is known that a sequential algorithm can compute the EDM in $O(n^2)$ and thus this algorithm is optimal. Also, work-time optimal parallel algorithms for shared memory model have been presented. However, these algorithms are too complicated to implement in existing shared memory parallel machines. The main contribution of this paper is to develop a simple parallel algorithm for the EDM and implement it in two parallel platforms: multicore processors and a Graphics Processing Unit (GPU). More specifically, we have implemented our parallel algorithm in a Linux server with four Intel hexad-core processors (Intel Xeon X7460 2.66GHz). We have also implemented it in a modern GPU system, Tesla C1060, respectively. The experimental results have shown that, for an input binary image with size of $10000\times 10000$, our implementation in the multi-core system achieves a speedup factor of 18 over the performance of a sequential algorithm using a single processor in the same system. Meanwhile, for the same input binary image, our implementation on the GPU achieves a speedup factor of 5 over the sequential algorithm implementation.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信