使用oneAPI开发跨GPU, FPGA和CPU的医学超声成像应用程序

Yong Wang, Yongfa Zhou, Q. Wang, Wang Yang, Qing Xu, Chen Wang
{"title":"使用oneAPI开发跨GPU, FPGA和CPU的医学超声成像应用程序","authors":"Yong Wang, Yongfa Zhou, Q. Wang, Wang Yang, Qing Xu, Chen Wang","doi":"10.1145/3456669.3456680","DOIUrl":null,"url":null,"abstract":"The Diagnostic ultrasound is a rapidly developing imaging technology that is widely used in the clinic. A typical ultrasound imaging pipeline including the following algorithms: beamforming, Envelope detection, log-compression, and scan-conversion [1]. In tradition, ultrasound imaging is implemented using Application-specific integrated circuits (ASICs) and FPGAs due to its high throughput and massive data processing requirements. With the development of the GPGPU and its programming environments (e.g. CUDA), researchers use software to implement ultrasound imaging algorithms [2], [3]. For now, the two limiting factors of developing ultrasound imaging are: First, using a hardware development approach to implement ultrasound imaging algorithms is complex, time-consuming and lacks flexibility. Second, the existing CUDA-based ultrasound imaging implementations are limited to Nvidia hardware, which is also a restriction applying more architectures. oneAPI is a cross-platform and unified programming environment developed by intel. It enables heterogeneous computing across multiple hardware architectures using Data Parallel C++ (DPC++). This new programming suite can be used to address the problems mentioned above. To be clear, using a high-level language like DPC++ to program FPGA can accelerate ultrasound imaging application development. SYCL-based ultrasound imaging applications can be easily migrated to other vendor's hardware. To implement an ultrasound imaging application across multiple architectures (e.g., GPU, FPGA, and CPU) in a unified programming environment. We migrated a CUDA-based open-source ultrasound imaging project SUPRA [4]. The migration process was performed using oneAPI compatibility tool (e.g. dpct). After migration, the code was tuned to run on GPU, FPGA, and CPU. In this talk, we will discuss our experiences with the complete process of migrating a CUDA code to oneAPI code. First, the whole process of migrating CUDA code base using the dpct will be presented, including usage, code modification, API comparison and build instruction. Second, the ultrasound imaging algorithms’ computation characteristics will be analyzed, and we will show how to optimize the application on Intel GPUs, Including ESIDM usage. Third, the early experiences of tuning the migrated code to target FPGA will be highlighted, this will include device code rewrite for FPGA and programming skills to improve performance on FPGA. The device code comparison of GPU and FPGA will also be discussed. Last, we will compare ultrasound imaging algorithms performance and computation results on different hardware, including Intel GPU (integrated GPU and discrete GPU), Intel Arria 10 FPGA, Intel CPU, Nvidia GTX 1080 GPU, and GTX 960M GPU.","PeriodicalId":73497,"journal":{"name":"International Workshop on OpenCL","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2021-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Developing medical ultrasound imaging application across GPU, FPGA, and CPU using oneAPI\",\"authors\":\"Yong Wang, Yongfa Zhou, Q. Wang, Wang Yang, Qing Xu, Chen Wang\",\"doi\":\"10.1145/3456669.3456680\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The Diagnostic ultrasound is a rapidly developing imaging technology that is widely used in the clinic. A typical ultrasound imaging pipeline including the following algorithms: beamforming, Envelope detection, log-compression, and scan-conversion [1]. In tradition, ultrasound imaging is implemented using Application-specific integrated circuits (ASICs) and FPGAs due to its high throughput and massive data processing requirements. With the development of the GPGPU and its programming environments (e.g. CUDA), researchers use software to implement ultrasound imaging algorithms [2], [3]. For now, the two limiting factors of developing ultrasound imaging are: First, using a hardware development approach to implement ultrasound imaging algorithms is complex, time-consuming and lacks flexibility. Second, the existing CUDA-based ultrasound imaging implementations are limited to Nvidia hardware, which is also a restriction applying more architectures. oneAPI is a cross-platform and unified programming environment developed by intel. It enables heterogeneous computing across multiple hardware architectures using Data Parallel C++ (DPC++). This new programming suite can be used to address the problems mentioned above. To be clear, using a high-level language like DPC++ to program FPGA can accelerate ultrasound imaging application development. SYCL-based ultrasound imaging applications can be easily migrated to other vendor's hardware. To implement an ultrasound imaging application across multiple architectures (e.g., GPU, FPGA, and CPU) in a unified programming environment. We migrated a CUDA-based open-source ultrasound imaging project SUPRA [4]. The migration process was performed using oneAPI compatibility tool (e.g. dpct). After migration, the code was tuned to run on GPU, FPGA, and CPU. In this talk, we will discuss our experiences with the complete process of migrating a CUDA code to oneAPI code. First, the whole process of migrating CUDA code base using the dpct will be presented, including usage, code modification, API comparison and build instruction. Second, the ultrasound imaging algorithms’ computation characteristics will be analyzed, and we will show how to optimize the application on Intel GPUs, Including ESIDM usage. Third, the early experiences of tuning the migrated code to target FPGA will be highlighted, this will include device code rewrite for FPGA and programming skills to improve performance on FPGA. The device code comparison of GPU and FPGA will also be discussed. Last, we will compare ultrasound imaging algorithms performance and computation results on different hardware, including Intel GPU (integrated GPU and discrete GPU), Intel Arria 10 FPGA, Intel CPU, Nvidia GTX 1080 GPU, and GTX 960M GPU.\",\"PeriodicalId\":73497,\"journal\":{\"name\":\"International Workshop on OpenCL\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-04-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Workshop on OpenCL\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3456669.3456680\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Workshop on OpenCL","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3456669.3456680","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

摘要

超声诊断技术是一项发展迅速的影像学技术,在临床上得到了广泛的应用。典型的超声成像流水线包括以下算法:波束形成、包络检测、日志压缩和扫描转换[1]。在传统的超声成像中,由于其高吞吐量和大量数据处理要求,使用专用集成电路(asic)和fpga实现。随着GPGPU及其编程环境(如CUDA)的发展,研究者使用软件实现超声成像算法[2],[3]。目前,发展超声成像的两个限制因素是:第一,使用硬件开发方法实现超声成像算法复杂、耗时且缺乏灵活性。其次,现有的基于cuda的超声成像实现仅限于Nvidia硬件,这也限制了应用更多架构。oneAPI是英特尔公司开发的跨平台、统一的编程环境。它使用数据并行c++ (Data Parallel c++, dpc++)支持跨多个硬件架构的异构计算。这个新的编程套件可以用来解决上面提到的问题。需要说明的是,使用dpc++等高级语言对FPGA进行编程可以加快超声成像应用程序的开发。基于sycl的超声成像应用程序可以很容易地迁移到其他供应商的硬件。在统一的编程环境中实现跨多个架构(如GPU、FPGA和CPU)的超声成像应用程序。我们迁移了一个基于cuda的开源超声成像项目SUPRA[4]。迁移过程是使用一个api兼容性工具(例如dpct)执行的。迁移之后,代码被调优到可以在GPU、FPGA和CPU上运行。在这次演讲中,我们将讨论将CUDA代码迁移到oneAPI代码的完整过程的经验。首先,将介绍使用dpct迁移CUDA代码库的整个过程,包括使用、代码修改、API比较和构建指令。其次,将分析超声成像算法的计算特性,并展示如何优化在Intel gpu上的应用,包括ESIDM的使用。第三,将强调调整迁移代码到目标FPGA的早期经验,这将包括针对FPGA的设备代码重写和编程技巧,以提高FPGA的性能。并对GPU和FPGA的器件代码进行了比较。最后,我们将比较超声成像算法在不同硬件上的性能和计算结果,包括英特尔GPU(集成GPU和分立GPU)、英特尔Arria 10 FPGA、英特尔CPU、Nvidia GTX 1080 GPU和GTX 960M GPU。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Developing medical ultrasound imaging application across GPU, FPGA, and CPU using oneAPI
The Diagnostic ultrasound is a rapidly developing imaging technology that is widely used in the clinic. A typical ultrasound imaging pipeline including the following algorithms: beamforming, Envelope detection, log-compression, and scan-conversion [1]. In tradition, ultrasound imaging is implemented using Application-specific integrated circuits (ASICs) and FPGAs due to its high throughput and massive data processing requirements. With the development of the GPGPU and its programming environments (e.g. CUDA), researchers use software to implement ultrasound imaging algorithms [2], [3]. For now, the two limiting factors of developing ultrasound imaging are: First, using a hardware development approach to implement ultrasound imaging algorithms is complex, time-consuming and lacks flexibility. Second, the existing CUDA-based ultrasound imaging implementations are limited to Nvidia hardware, which is also a restriction applying more architectures. oneAPI is a cross-platform and unified programming environment developed by intel. It enables heterogeneous computing across multiple hardware architectures using Data Parallel C++ (DPC++). This new programming suite can be used to address the problems mentioned above. To be clear, using a high-level language like DPC++ to program FPGA can accelerate ultrasound imaging application development. SYCL-based ultrasound imaging applications can be easily migrated to other vendor's hardware. To implement an ultrasound imaging application across multiple architectures (e.g., GPU, FPGA, and CPU) in a unified programming environment. We migrated a CUDA-based open-source ultrasound imaging project SUPRA [4]. The migration process was performed using oneAPI compatibility tool (e.g. dpct). After migration, the code was tuned to run on GPU, FPGA, and CPU. In this talk, we will discuss our experiences with the complete process of migrating a CUDA code to oneAPI code. First, the whole process of migrating CUDA code base using the dpct will be presented, including usage, code modification, API comparison and build instruction. Second, the ultrasound imaging algorithms’ computation characteristics will be analyzed, and we will show how to optimize the application on Intel GPUs, Including ESIDM usage. Third, the early experiences of tuning the migrated code to target FPGA will be highlighted, this will include device code rewrite for FPGA and programming skills to improve performance on FPGA. The device code comparison of GPU and FPGA will also be discussed. Last, we will compare ultrasound imaging algorithms performance and computation results on different hardware, including Intel GPU (integrated GPU and discrete GPU), Intel Arria 10 FPGA, Intel CPU, Nvidia GTX 1080 GPU, and GTX 960M GPU.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信