cuMBIR: An Efficient Framework for Low-dose X-ray CT Image Reconstruction on GPUs

Proceedings of the 2018 International Conference on Supercomputing Pub Date : 2018-06-12 DOI:10.1145/3205289.3205309

Xiuhong Li, Yun Liang, Wentai Zhang, Taide Liu, Haochen Li, Guojie Luo, M. Jiang

{"title":"cuMBIR: An Efficient Framework for Low-dose X-ray CT Image Reconstruction on GPUs","authors":"Xiuhong Li, Yun Liang, Wentai Zhang, Taide Liu, Haochen Li, Guojie Luo, M. Jiang","doi":"10.1145/3205289.3205309","DOIUrl":null,"url":null,"abstract":"Low-dose X-ray computed tomography (XCT) is a popular imaging technique to visualize the inside structure of object non-destructively. Model-based Iterative Reconstruction (MBIR) method can reconstruct high-quality image but at the cost of large computational demands. Therefore, MBIR of ten resorts to the platforms with hardware accelerators such as GPUs to speed up the reconstruction process. For MBIR, the reconstruction process is to minimize an objective function by updating image iteratively. The X-ray source emits large amounts of X-rays from various views to cover the object as much as possible. Different X-rays always have complex and irregular geometric relationship. This inherent irregularity makes the minimization process of the objective function on GPUs very challenging. First, different implementations of the minimization of objective function have different impacts on the convergence and GPU resource utilization. To this end, we explore different solvers to the minimization problem and different parallelism granularities for GPU kernel design. Second, the complex and irregular geometric relationship of X-rays introduces irregular memory behaviors. Two nearby X-rays may intersect and thus incur memory collisions, while two far away X-rays may incur non-coalesced memory accesses. We design a unified thread mapping algorithm to guide the mapping from X-rays to threads, which can optimize the memory collisions and non-coalesced memory accesses together. Finally, we present a series of architecture level optimizations to fully release the horse power of GPUs. Evaluation results demonstrate that cuMBIR can achieve 1.48X speedup over the state-of-the-art implementation on GPUs.","PeriodicalId":441217,"journal":{"name":"Proceedings of the 2018 International Conference on Supercomputing","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 International Conference on Supercomputing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3205289.3205309","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

Abstract

Low-dose X-ray computed tomography (XCT) is a popular imaging technique to visualize the inside structure of object non-destructively. Model-based Iterative Reconstruction (MBIR) method can reconstruct high-quality image but at the cost of large computational demands. Therefore, MBIR of ten resorts to the platforms with hardware accelerators such as GPUs to speed up the reconstruction process. For MBIR, the reconstruction process is to minimize an objective function by updating image iteratively. The X-ray source emits large amounts of X-rays from various views to cover the object as much as possible. Different X-rays always have complex and irregular geometric relationship. This inherent irregularity makes the minimization process of the objective function on GPUs very challenging. First, different implementations of the minimization of objective function have different impacts on the convergence and GPU resource utilization. To this end, we explore different solvers to the minimization problem and different parallelism granularities for GPU kernel design. Second, the complex and irregular geometric relationship of X-rays introduces irregular memory behaviors. Two nearby X-rays may intersect and thus incur memory collisions, while two far away X-rays may incur non-coalesced memory accesses. We design a unified thread mapping algorithm to guide the mapping from X-rays to threads, which can optimize the memory collisions and non-coalesced memory accesses together. Finally, we present a series of architecture level optimizations to fully release the horse power of GPUs. Evaluation results demonstrate that cuMBIR can achieve 1.48X speedup over the state-of-the-art implementation on GPUs.

查看原文本刊更多论文

基于gpu的低剂量x线CT图像重构框架

低剂量x射线计算机断层扫描(XCT)是一种常用的对物体内部结构进行无损成像的成像技术。基于模型的迭代重建(MBIR)方法可以重建出高质量的图像，但需要大量的计算量。因此，MBIR通常会借助带有gpu等硬件加速器的平台来加快重建过程。对于MBIR，重建过程是通过迭代更新图像来最小化目标函数。x射线源从不同的角度发射大量的x射线，以尽可能多地覆盖物体。不同的x射线总是具有复杂而不规则的几何关系。这种固有的不规则性使得gpu上目标函数的最小化过程非常具有挑战性。首先，目标函数最小化的不同实现方式对收敛性和GPU资源利用率有不同的影响。为此，我们探索了GPU内核设计中最小化问题的不同求解方法和不同并行度粒度。其次，x射线复杂而不规则的几何关系引入了不规则的记忆行为。两个附近的x射线可能相交，从而导致内存碰撞，而两个遥远的x射线可能导致非合并内存访问。我们设计了一个统一的线程映射算法来引导x射线到线程的映射，可以同时优化内存冲突和非合并内存访问。最后，我们提出了一系列架构级优化，以充分释放gpu的马力。评估结果表明，与gpu上最先进的实现相比，cuMBIR可以实现1.48倍的加速。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2018 International Conference on Supercomputing

自引率

0.00%

发文量