Sebastian Kazmarek Præsius;Lasse Thurmann Jørgensen;Jørgen Arendt Jensen
{"title":"Real-Time Full-Volume Row-Column Imaging","authors":"Sebastian Kazmarek Præsius;Lasse Thurmann Jørgensen;Jørgen Arendt Jensen","doi":"10.1109/TUFFC.2024.3509683","DOIUrl":null,"url":null,"abstract":"An implementation of volumetric beamforming for row-column addressed (RCA) arrays is proposed, with optimizations for graphics processing units (GPUs). It is hypothesized that entire volumes can imaged in real-time by a consumer-class GPU at an emission rate <inline-formula> <tex-math>$\\geq 12$ </tex-math></inline-formula> kHz. A separable beamforming algorithm was used to reduce the number of calculations with a negligible impact on the image quality. Here, a single image was beamformed for each emission and then extrapolated to reproduce the volume, which resulted in <inline-formula> <tex-math>$65\\times $ </tex-math></inline-formula> fewer calculations per volume. Reusing computations and samples among adjacent pixels and frames reduced the amount of overhead and load instructions, increasing performance. A GPU beamformer, written in compute unified device architecture (CUDA) C++, was modified to implement the dual-stage imaging with optimizations. In vivo rat kidney data were acquired using a 6-MHz Vermon 128 + 128 RCA probe and a Verasonics Vantage 256 scanner. The acquisition used 96 defocused emissions at a 12-kHz rate for a volume acquisition rate of 125 Hz. Processing time, including all preprocessing, was measured for an NVIDIA GeForce RTX 4090 GPU, and the resulting beamforming rate was 1440 volumes/s, greatly exceeding the real-time rate. Based on the GPU’s floating-point throughput, this corresponds to 22% of the theoretically achievable rate. High efficiency was also shown for an RTX 2080 Ti and RTX 3090, both achieving real-time imaging. This shows that 3-D imaging can be performed in real-time with a setup similar to 2-D imaging: using a single graphics card, one scanner, and 128 transmit/receive channels.","PeriodicalId":13322,"journal":{"name":"IEEE transactions on ultrasonics, ferroelectrics, and frequency control","volume":"72 1","pages":"109-126"},"PeriodicalIF":3.0000,"publicationDate":"2024-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on ultrasonics, ferroelectrics, and frequency control","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10772135/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ACOUSTICS","Score":null,"Total":0}
引用次数: 0
Abstract
An implementation of volumetric beamforming for row-column addressed (RCA) arrays is proposed, with optimizations for graphics processing units (GPUs). It is hypothesized that entire volumes can imaged in real-time by a consumer-class GPU at an emission rate $\geq 12$ kHz. A separable beamforming algorithm was used to reduce the number of calculations with a negligible impact on the image quality. Here, a single image was beamformed for each emission and then extrapolated to reproduce the volume, which resulted in $65\times $ fewer calculations per volume. Reusing computations and samples among adjacent pixels and frames reduced the amount of overhead and load instructions, increasing performance. A GPU beamformer, written in compute unified device architecture (CUDA) C++, was modified to implement the dual-stage imaging with optimizations. In vivo rat kidney data were acquired using a 6-MHz Vermon 128 + 128 RCA probe and a Verasonics Vantage 256 scanner. The acquisition used 96 defocused emissions at a 12-kHz rate for a volume acquisition rate of 125 Hz. Processing time, including all preprocessing, was measured for an NVIDIA GeForce RTX 4090 GPU, and the resulting beamforming rate was 1440 volumes/s, greatly exceeding the real-time rate. Based on the GPU’s floating-point throughput, this corresponds to 22% of the theoretically achievable rate. High efficiency was also shown for an RTX 2080 Ti and RTX 3090, both achieving real-time imaging. This shows that 3-D imaging can be performed in real-time with a setup similar to 2-D imaging: using a single graphics card, one scanner, and 128 transmit/receive channels.
提出了一种针对行-列寻址(RCA)阵列的体积波束形成实现方法,并对图形处理单元(gpu)进行了优化。假设整个体积可以通过消费级GPU以$\geq 12$ kHz的发射速率实时成像。采用了一种可分离波束形成算法,减少了计算次数,对图像质量的影响可以忽略不计。在这里,每个发射都有一个单一的图像波束形成,然后外推以重现体积,这导致每个体积的计算减少$65\times $。在相邻像素和帧之间重用计算和采样减少了开销和加载指令的数量,从而提高了性能。用CUDA c++编写的GPU波束形成器经过优化,实现了双级成像。使用6 mhz Vermon 128 + 128 RCA探针和Verasonics Vantage 256扫描仪获得体内大鼠肾脏数据。采集使用96个分散的发射,在12 khz的速率为125 Hz的体积采集率。对NVIDIA GeForce RTX 4090 GPU的处理时间(包括所有预处理)进行了测量,得到的波束形成速率为1440卷/秒,大大超过了实时速率。基于GPU的浮点吞吐量,这相当于22% of the theoretically achievable rate. High efficiency was also shown for an RTX 2080 Ti and RTX 3090, both achieving real-time imaging. This shows that 3-D imaging can be performed in real-time with a setup similar to 2-D imaging: using a single graphics card, one scanner, and 128 transmit/receive channels.
期刊介绍:
IEEE Transactions on Ultrasonics, Ferroelectrics and Frequency Control includes the theory, technology, materials, and applications relating to: (1) the generation, transmission, and detection of ultrasonic waves and related phenomena; (2) medical ultrasound, including hyperthermia, bioeffects, tissue characterization and imaging; (3) ferroelectric, piezoelectric, and piezomagnetic materials, including crystals, polycrystalline solids, films, polymers, and composites; (4) frequency control, timing and time distribution, including crystal oscillators and other means of classical frequency control, and atomic, molecular and laser frequency control standards. Areas of interest range from fundamental studies to the design and/or applications of devices and systems.