Rui Sun , Chuanba Liu , Wenshuo Wang , Yimin Song , Tao Sun
{"title":"UltrasOM:一个基于曼巴的网络,用于使用光流进行三维手绘超声重建","authors":"Rui Sun , Chuanba Liu , Wenshuo Wang , Yimin Song , Tao Sun","doi":"10.1016/j.cmpb.2025.108843","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Three-dimensional (3D) ultrasound (US) reconstruction is of significant value in clinical diagnosis, characterized by its safety, portability, low cost, and high real-time capabilities. 3D freehand ultrasound reconstruction aims to eliminate the need for tracking devices, relying solely on image data to infer the spatial relationships between frames. However, inherent jitter during handheld scanning introduces significant inaccuracies, making current methods ineffective in precisely predicting the spatial motions of ultrasound image frames. This leads to substantial cumulative errors over long sequence modeling, resulting in deformations or artifacts in the reconstructed volume. To address these challenges, we proposed UltrasOM, a 3D ultrasound reconstruction network designed for spatial relative motion estimation.</div></div><div><h3>Methods</h3><div>Initially, we designed a video embedding module that integrates optical flow dynamics with original static information to enhance motion change features between frames. Next, we developed a Mamba-based spatiotemporal attention module, utilizing multi-layer stacked Space-Time Blocks to effectively capture global spatiotemporal correlations within video frame sequences. Finally, we incorporated correlation loss and motion speed loss to prevent overfitting related to scanning speed and pose, enhancing the model's generalization capability.</div></div><div><h3>Results</h3><div>Experimental results on a dataset of 200 forearm cases, comprising 58,011 frames, demonstrated that the proposed method achieved a final drift rate (FDR) of 10.24 %, a frame-to-frame distance error (DE) of 7.34 mm, a symmetric Hausdorff distance error (HD) of 10.81 mm, and a mean angular error (MEA) of 2.05°, outperforming state-of-the-art methods by 13.24 %, 15.11 %, 3.57 %, and 6.32 %, respectively.</div></div><div><h3>Conclusion</h3><div>By integrating optical flow features and deeply exploring contextual spatiotemporal dependencies, the proposed network can directly predict the relative motions between multiple frames of ultrasound images without the need for tracking, surpassing the accuracy of existing methods.</div></div>","PeriodicalId":10624,"journal":{"name":"Computer methods and programs in biomedicine","volume":"268 ","pages":"Article 108843"},"PeriodicalIF":4.9000,"publicationDate":"2025-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"UltrasOM: A mamba-based network for 3D freehand ultrasound reconstruction using optical flow\",\"authors\":\"Rui Sun , Chuanba Liu , Wenshuo Wang , Yimin Song , Tao Sun\",\"doi\":\"10.1016/j.cmpb.2025.108843\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background</h3><div>Three-dimensional (3D) ultrasound (US) reconstruction is of significant value in clinical diagnosis, characterized by its safety, portability, low cost, and high real-time capabilities. 3D freehand ultrasound reconstruction aims to eliminate the need for tracking devices, relying solely on image data to infer the spatial relationships between frames. However, inherent jitter during handheld scanning introduces significant inaccuracies, making current methods ineffective in precisely predicting the spatial motions of ultrasound image frames. This leads to substantial cumulative errors over long sequence modeling, resulting in deformations or artifacts in the reconstructed volume. To address these challenges, we proposed UltrasOM, a 3D ultrasound reconstruction network designed for spatial relative motion estimation.</div></div><div><h3>Methods</h3><div>Initially, we designed a video embedding module that integrates optical flow dynamics with original static information to enhance motion change features between frames. Next, we developed a Mamba-based spatiotemporal attention module, utilizing multi-layer stacked Space-Time Blocks to effectively capture global spatiotemporal correlations within video frame sequences. Finally, we incorporated correlation loss and motion speed loss to prevent overfitting related to scanning speed and pose, enhancing the model's generalization capability.</div></div><div><h3>Results</h3><div>Experimental results on a dataset of 200 forearm cases, comprising 58,011 frames, demonstrated that the proposed method achieved a final drift rate (FDR) of 10.24 %, a frame-to-frame distance error (DE) of 7.34 mm, a symmetric Hausdorff distance error (HD) of 10.81 mm, and a mean angular error (MEA) of 2.05°, outperforming state-of-the-art methods by 13.24 %, 15.11 %, 3.57 %, and 6.32 %, respectively.</div></div><div><h3>Conclusion</h3><div>By integrating optical flow features and deeply exploring contextual spatiotemporal dependencies, the proposed network can directly predict the relative motions between multiple frames of ultrasound images without the need for tracking, surpassing the accuracy of existing methods.</div></div>\",\"PeriodicalId\":10624,\"journal\":{\"name\":\"Computer methods and programs in biomedicine\",\"volume\":\"268 \",\"pages\":\"Article 108843\"},\"PeriodicalIF\":4.9000,\"publicationDate\":\"2025-05-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer methods and programs in biomedicine\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0169260725002603\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer methods and programs in biomedicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169260725002603","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
UltrasOM: A mamba-based network for 3D freehand ultrasound reconstruction using optical flow
Background
Three-dimensional (3D) ultrasound (US) reconstruction is of significant value in clinical diagnosis, characterized by its safety, portability, low cost, and high real-time capabilities. 3D freehand ultrasound reconstruction aims to eliminate the need for tracking devices, relying solely on image data to infer the spatial relationships between frames. However, inherent jitter during handheld scanning introduces significant inaccuracies, making current methods ineffective in precisely predicting the spatial motions of ultrasound image frames. This leads to substantial cumulative errors over long sequence modeling, resulting in deformations or artifacts in the reconstructed volume. To address these challenges, we proposed UltrasOM, a 3D ultrasound reconstruction network designed for spatial relative motion estimation.
Methods
Initially, we designed a video embedding module that integrates optical flow dynamics with original static information to enhance motion change features between frames. Next, we developed a Mamba-based spatiotemporal attention module, utilizing multi-layer stacked Space-Time Blocks to effectively capture global spatiotemporal correlations within video frame sequences. Finally, we incorporated correlation loss and motion speed loss to prevent overfitting related to scanning speed and pose, enhancing the model's generalization capability.
Results
Experimental results on a dataset of 200 forearm cases, comprising 58,011 frames, demonstrated that the proposed method achieved a final drift rate (FDR) of 10.24 %, a frame-to-frame distance error (DE) of 7.34 mm, a symmetric Hausdorff distance error (HD) of 10.81 mm, and a mean angular error (MEA) of 2.05°, outperforming state-of-the-art methods by 13.24 %, 15.11 %, 3.57 %, and 6.32 %, respectively.
Conclusion
By integrating optical flow features and deeply exploring contextual spatiotemporal dependencies, the proposed network can directly predict the relative motions between multiple frames of ultrasound images without the need for tracking, surpassing the accuracy of existing methods.
期刊介绍:
To encourage the development of formal computing methods, and their application in biomedical research and medical practice, by illustration of fundamental principles in biomedical informatics research; to stimulate basic research into application software design; to report the state of research of biomedical information processing projects; to report new computer methodologies applied in biomedical areas; the eventual distribution of demonstrable software to avoid duplication of effort; to provide a forum for discussion and improvement of existing software; to optimize contact between national organizations and regional user groups by promoting an international exchange of information on formal methods, standards and software in biomedicine.
Computer Methods and Programs in Biomedicine covers computing methodology and software systems derived from computing science for implementation in all aspects of biomedical research and medical practice. It is designed to serve: biochemists; biologists; geneticists; immunologists; neuroscientists; pharmacologists; toxicologists; clinicians; epidemiologists; psychiatrists; psychologists; cardiologists; chemists; (radio)physicists; computer scientists; programmers and systems analysts; biomedical, clinical, electrical and other engineers; teachers of medical informatics and users of educational software.