Debasis Mitra, Hui Pan, Fares Alhassen, Youngho Seo
{"title":"多模态迭代重构算法的并行化。","authors":"Debasis Mitra, Hui Pan, Fares Alhassen, Youngho Seo","doi":"10.1109/NSSMIC.2014.7430944","DOIUrl":null,"url":null,"abstract":"<p><p>In this work we have parallelized the Maximum Likelihood Expectation-Maximization (MLEM) and Ordered Subset Expectation Maximization (OSEM) algorithms for improving efficiency of reconstructions of multiple pinholes SPECT, and cone-bean CT data. We implemented the parallelized versions of the algorithms on a General Purpose Graphic Processing Unit (GPGPU): 448 cores of a NVIDIA Tesla M2070 GPU with 6GB RAM per thread of computing. We compared their run times against those from the corresponding CPU implementations running on 8 cores CPU of an AMD Opteron 6128 with 32 GB RAM. We have further shown how an optimization of thread balancing can accelerate the speed of the GPU implementation.</p>","PeriodicalId":73298,"journal":{"name":"IEEE Nuclear Science Symposium conference record. Nuclear Science Symposium","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2014-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/NSSMIC.2014.7430944","citationCount":"5","resultStr":"{\"title\":\"Parallelization of Iterative Reconstruction Algorithms in Multiple Modalities.\",\"authors\":\"Debasis Mitra, Hui Pan, Fares Alhassen, Youngho Seo\",\"doi\":\"10.1109/NSSMIC.2014.7430944\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>In this work we have parallelized the Maximum Likelihood Expectation-Maximization (MLEM) and Ordered Subset Expectation Maximization (OSEM) algorithms for improving efficiency of reconstructions of multiple pinholes SPECT, and cone-bean CT data. We implemented the parallelized versions of the algorithms on a General Purpose Graphic Processing Unit (GPGPU): 448 cores of a NVIDIA Tesla M2070 GPU with 6GB RAM per thread of computing. We compared their run times against those from the corresponding CPU implementations running on 8 cores CPU of an AMD Opteron 6128 with 32 GB RAM. We have further shown how an optimization of thread balancing can accelerate the speed of the GPU implementation.</p>\",\"PeriodicalId\":73298,\"journal\":{\"name\":\"IEEE Nuclear Science Symposium conference record. Nuclear Science Symposium\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1109/NSSMIC.2014.7430944\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Nuclear Science Symposium conference record. Nuclear Science Symposium\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NSSMIC.2014.7430944\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Nuclear Science Symposium conference record. Nuclear Science Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NSSMIC.2014.7430944","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
摘要
在这项工作中,我们并行化了最大似然期望最大化(MLEM)和有序子集期望最大化(OSEM)算法,以提高多针孔SPECT和锥bean CT数据的重建效率。我们在通用图形处理单元(GPGPU)上实现了算法的并行化版本:NVIDIA Tesla M2070 GPU的448核,每线程计算6GB RAM。我们将它们的运行时间与相应CPU实现的运行时间进行了比较,这些CPU实现运行在AMD Opteron 6128的8核CPU上,具有32 GB RAM。我们进一步展示了线程平衡的优化如何加快GPU实现的速度。
Parallelization of Iterative Reconstruction Algorithms in Multiple Modalities.
In this work we have parallelized the Maximum Likelihood Expectation-Maximization (MLEM) and Ordered Subset Expectation Maximization (OSEM) algorithms for improving efficiency of reconstructions of multiple pinholes SPECT, and cone-bean CT data. We implemented the parallelized versions of the algorithms on a General Purpose Graphic Processing Unit (GPGPU): 448 cores of a NVIDIA Tesla M2070 GPU with 6GB RAM per thread of computing. We compared their run times against those from the corresponding CPU implementations running on 8 cores CPU of an AMD Opteron 6128 with 32 GB RAM. We have further shown how an optimization of thread balancing can accelerate the speed of the GPU implementation.