{"title":"Profiling and Optimization of CT Reconstruction on Nvidia Quadro GV100","authors":"S. Dwivedi, Andreas Heumann","doi":"10.1109/HPEC43674.2020.9286223","DOIUrl":null,"url":null,"abstract":"Computed Tomography (CT) Imaging is a widely used technique for medical and industrial applications. Iterative reconstruction algorithms are desired for improved reconstructed image quality and lower dose, but its computational requirements limit its practical usage. Reconstruction toolkit (RTK) is a package of open source GPU accelerated algorithms for CBCT (cone beam computed tomography). GPU based iterative algorithms gives immense acceleration, but it may not be optimized to use the GPU resources efficiently. Nvidia has released several profilers (Nsight-systems, Nsight-compute) to analyze the GPU implementation of an algorithm from compute utilization and memory efficiency perspective. This paper profiles and analyzes the GPU implementation of iterative FDK algorithm in RTK and optimizes it for computation and memory usage on a Quadro GV100 GPU with 32 GB of memory and over 5000 cuda cores. RTK based GPU accelerated iterative FDK when applied on a 4 byte per pixel input projection dataset of size 1.1 GB (512×512×1024) for 20 iterations, to reconstruct a volume of size 440 MB (512×512×441) with 4 byte per pixel, resulted in total runtime of ~11.2 seconds per iteration. Optimized RTK based iterative FDK presented in this paper took ~1.3 seconds per iteration.","PeriodicalId":168544,"journal":{"name":"2020 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE High Performance Extreme Computing Conference (HPEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPEC43674.2020.9286223","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Computed Tomography (CT) Imaging is a widely used technique for medical and industrial applications. Iterative reconstruction algorithms are desired for improved reconstructed image quality and lower dose, but its computational requirements limit its practical usage. Reconstruction toolkit (RTK) is a package of open source GPU accelerated algorithms for CBCT (cone beam computed tomography). GPU based iterative algorithms gives immense acceleration, but it may not be optimized to use the GPU resources efficiently. Nvidia has released several profilers (Nsight-systems, Nsight-compute) to analyze the GPU implementation of an algorithm from compute utilization and memory efficiency perspective. This paper profiles and analyzes the GPU implementation of iterative FDK algorithm in RTK and optimizes it for computation and memory usage on a Quadro GV100 GPU with 32 GB of memory and over 5000 cuda cores. RTK based GPU accelerated iterative FDK when applied on a 4 byte per pixel input projection dataset of size 1.1 GB (512×512×1024) for 20 iterations, to reconstruct a volume of size 440 MB (512×512×441) with 4 byte per pixel, resulted in total runtime of ~11.2 seconds per iteration. Optimized RTK based iterative FDK presented in this paper took ~1.3 seconds per iteration.