Hongjian Wang, T. Huynh, H. Gemmeke, T. Hopp, J. Hesser
{"title":"GPU Acceleration of Wave Based Transmission Tomography","authors":"Hongjian Wang, T. Huynh, H. Gemmeke, T. Hopp, J. Hesser","doi":"10.1109/ISBI.2019.8759453","DOIUrl":null,"url":null,"abstract":"To accelerate the process of 3D ultrasound computed tomography, we parallelize the most time-consuming part of a paraxial forward model on GPU, where massive complex multiplications and 2D Fourier transforms have to be performed iteratively. We test our GPU implementation on a synthesized symmetric breast phantom with different sizes. In the best case, for only one emitter position, the speedup of a desktop GPU reaches 23 times when the data transfer time is included, and 100 times when only GPU parallel computing time is considered. In the worst case, the speedup of a less powerful laptop GPU is still 2.5 times over a six-core desktop CPU, when the data transfer time is included. For the correctness of the values computed on GPU, the maximum percent deviation of L2 norm is only 0.014%.","PeriodicalId":119935,"journal":{"name":"2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISBI.2019.8759453","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
To accelerate the process of 3D ultrasound computed tomography, we parallelize the most time-consuming part of a paraxial forward model on GPU, where massive complex multiplications and 2D Fourier transforms have to be performed iteratively. We test our GPU implementation on a synthesized symmetric breast phantom with different sizes. In the best case, for only one emitter position, the speedup of a desktop GPU reaches 23 times when the data transfer time is included, and 100 times when only GPU parallel computing time is considered. In the worst case, the speedup of a less powerful laptop GPU is still 2.5 times over a six-core desktop CPU, when the data transfer time is included. For the correctness of the values computed on GPU, the maximum percent deviation of L2 norm is only 0.014%.