Charles Guan, Alexander P Rockhill, Masashi Sode, Gianmarco Pinton
{"title":"mach: ultrafast ultrasound beamforming.","authors":"Charles Guan, Alexander P Rockhill, Masashi Sode, Gianmarco Pinton","doi":"10.1117/1.JMI.13.6.062203","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>Volumetric ultrafast ultrasound produces massive datasets with high frame rates, dense reconstruction grids, and large channel counts. Beamforming computational demands limit research throughput and prevent real-time applications in emerging modalities such as elastography, functional neuroimaging, and microscopy.</p><p><strong>Approach: </strong>We developed mach, an open-source, GPU-accelerated beamformer with a highly optimized delay-and-sum CUDA kernel and an accessible Python interface. mach uses a hybrid delay computation strategy that substantially reduces memory overhead compared with fully precomputed approaches. The CUDA implementation optimizes memory layout for coalesced access and reuses delay computations across frames via shared memory. We benchmarked mach on the PyMUST rotating disk dataset and validated numerical accuracy against existing open-source beamformers.</p><p><strong>Results: </strong>mach processes 1.1 trillion points per second on a consumer-grade GPU, achieving > <math><mrow><mn>10</mn> <mo>×</mo></mrow> </math> faster performance than existing open-source GPU beamformers. On the PyMUST rotating disk benchmark, mach completes reconstruction in 0.23 ms, 6× faster than the acoustic round-trip time to the imaging depth. Validation against other beamformers confirms numerical accuracy with errors below <math><mrow><mo>-</mo> <mn>60</mn> <mtext> </mtext> <mi>dB</mi></mrow> </math> for Power Doppler and <math><mrow><mo>-</mo> <mn>120</mn> <mtext> </mtext> <mi>dB</mi></mrow> </math> for B-mode.</p><p><strong>Conclusions: </strong>mach achieves 1.1 trillion points per second throughput, enabling real-time 3D ultrafast ultrasound reconstruction for the first time on consumer-grade hardware. By eliminating the beamforming bottleneck, mach enables real-time applications such as 3D functional neuroimaging, intraoperative guidance, and ultrasound localization microscopy. mach is freely available at https://github.com/Forest-Neurotech/mach.</p>","PeriodicalId":47707,"journal":{"name":"Journal of Medical Imaging","volume":"13 6","pages":"062203"},"PeriodicalIF":1.7000,"publicationDate":"2026-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13053060/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Medical Imaging","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1117/1.JMI.13.6.062203","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2026/4/6 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose: Volumetric ultrafast ultrasound produces massive datasets with high frame rates, dense reconstruction grids, and large channel counts. Beamforming computational demands limit research throughput and prevent real-time applications in emerging modalities such as elastography, functional neuroimaging, and microscopy.
Approach: We developed mach, an open-source, GPU-accelerated beamformer with a highly optimized delay-and-sum CUDA kernel and an accessible Python interface. mach uses a hybrid delay computation strategy that substantially reduces memory overhead compared with fully precomputed approaches. The CUDA implementation optimizes memory layout for coalesced access and reuses delay computations across frames via shared memory. We benchmarked mach on the PyMUST rotating disk dataset and validated numerical accuracy against existing open-source beamformers.
Results: mach processes 1.1 trillion points per second on a consumer-grade GPU, achieving > faster performance than existing open-source GPU beamformers. On the PyMUST rotating disk benchmark, mach completes reconstruction in 0.23 ms, 6× faster than the acoustic round-trip time to the imaging depth. Validation against other beamformers confirms numerical accuracy with errors below for Power Doppler and for B-mode.
Conclusions: mach achieves 1.1 trillion points per second throughput, enabling real-time 3D ultrafast ultrasound reconstruction for the first time on consumer-grade hardware. By eliminating the beamforming bottleneck, mach enables real-time applications such as 3D functional neuroimaging, intraoperative guidance, and ultrasound localization microscopy. mach is freely available at https://github.com/Forest-Neurotech/mach.
期刊介绍:
JMI covers fundamental and translational research, as well as applications, focused on medical imaging, which continue to yield physical and biomedical advancements in the early detection, diagnostics, and therapy of disease as well as in the understanding of normal. The scope of JMI includes: Imaging physics, Tomographic reconstruction algorithms (such as those in CT and MRI), Image processing and deep learning, Computer-aided diagnosis and quantitative image analysis, Visualization and modeling, Picture archiving and communications systems (PACS), Image perception and observer performance, Technology assessment, Ultrasonic imaging, Image-guided procedures, Digital pathology, Biomedical applications of biomedical imaging. JMI allows for the peer-reviewed communication and archiving of scientific developments, translational and clinical applications, reviews, and recommendations for the field.