单视角透视 X 光姿势估计：替代损失函数和体积场景表示法的比较。

Journal of imaging informatics in medicine Pub Date : 2024-12-13 DOI:10.1007/s10278-024-01354-w

Chaochao Zhou, Syed Hasib Akhter Faruqui, Dayeong An, Abhinav Patel, Ramez N Abdalla, Michael C Hurley, Ali Shaibani, Matthew B Potts, Babak S Jahromi, Sameer A Ansari, Donald R Cantrell

{"title":"单视角透视 X 光姿势估计：替代损失函数和体积场景表示法的比较。","authors":"Chaochao Zhou, Syed Hasib Akhter Faruqui, Dayeong An, Abhinav Patel, Ramez N Abdalla, Michael C Hurley, Ali Shaibani, Matthew B Potts, Babak S Jahromi, Sameer A Ansari, Donald R Cantrell","doi":"10.1007/s10278-024-01354-w","DOIUrl":null,"url":null,"abstract":"Many tasks performed in image-guided procedures can be cast as pose estimation problems, where specific projections are chosen to reach a target in 3D space. In this study, we construct a framework for fluoroscopic pose estimation and compare alternative loss functions and volumetric scene representations. We first develop a differentiable projection (DiffProj) algorithm for the efficient computation of Digitally Reconstructed Radiographs (DRRs) from either Cone-Beam Computerized Tomography (CBCT) or neural scene representations. We introduce two innovative neural scene representations, Neural Tuned Tomography (NeTT) and masked Neural Radiance Fields (mNeRF). Pose estimation is then performed within the framework by iterative gradient descent using loss functions that quantify the image discrepancy of the synthesized DRR with respect to the ground-truth, target fluoroscopic X-ray image. We compared alternative loss functions and volumetric scene representations for pose estimation using a dataset consisting of 50 cranial tomographic X-ray sequences. We find that Mutual Information significantly outperforms alternative loss functions for pose estimation, avoiding entrapment in local optima. The alternative discrete (CBCT) and neural (NeTT and mNeRF) volumetric scene representations yield comparable performance (3D angle errors, mean ≤ 3.2° and 90% quantile ≤ 3.4°); however, the neural scene representations incur a considerable computational expense to train.","PeriodicalId":516858,"journal":{"name":"Journal of imaging informatics in medicine","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Single-View Fluoroscopic X-Ray Pose Estimation: A Comparison of Alternative Loss Functions and Volumetric Scene Representations.\",\"authors\":\"Chaochao Zhou, Syed Hasib Akhter Faruqui, Dayeong An, Abhinav Patel, Ramez N Abdalla, Michael C Hurley, Ali Shaibani, Matthew B Potts, Babak S Jahromi, Sameer A Ansari, Donald R Cantrell\",\"doi\":\"10.1007/s10278-024-01354-w\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Many tasks performed in image-guided procedures can be cast as pose estimation problems, where specific projections are chosen to reach a target in 3D space. In this study, we construct a framework for fluoroscopic pose estimation and compare alternative loss functions and volumetric scene representations. We first develop a differentiable projection (DiffProj) algorithm for the efficient computation of Digitally Reconstructed Radiographs (DRRs) from either Cone-Beam Computerized Tomography (CBCT) or neural scene representations. We introduce two innovative neural scene representations, Neural Tuned Tomography (NeTT) and masked Neural Radiance Fields (mNeRF). Pose estimation is then performed within the framework by iterative gradient descent using loss functions that quantify the image discrepancy of the synthesized DRR with respect to the ground-truth, target fluoroscopic X-ray image. We compared alternative loss functions and volumetric scene representations for pose estimation using a dataset consisting of 50 cranial tomographic X-ray sequences. We find that Mutual Information significantly outperforms alternative loss functions for pose estimation, avoiding entrapment in local optima. The alternative discrete (CBCT) and neural (NeTT and mNeRF) volumetric scene representations yield comparable performance (3D angle errors, mean ≤ 3.2° and 90% quantile ≤ 3.4°); however, the neural scene representations incur a considerable computational expense to train.\",\"PeriodicalId\":516858,\"journal\":{\"name\":\"Journal of imaging informatics in medicine\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-12-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of imaging informatics in medicine\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s10278-024-01354-w\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of imaging informatics in medicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s10278-024-01354-w","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在图像引导程序中执行的许多任务都可以作为姿态估计问题，其中选择特定的投影来达到3D空间中的目标。在这项研究中，我们构建了一个透视姿态估计框架，并比较了替代损失函数和体积场景表示。我们首先开发了一种可微分投影（DiffProj）算法，用于从锥束计算机断层扫描（CBCT）或神经场景表示中高效计算数字重建射线照片（DRRs）。我们介绍了两种创新的神经场景表示，神经调谐断层扫描（NeTT）和掩蔽神经辐射场（mNeRF）。然后在框架内通过使用损失函数的迭代梯度下降进行姿态估计，损失函数量化合成DRR相对于真实的目标透视x射线图像的图像差异。我们使用由50个颅层析x射线序列组成的数据集比较了姿态估计的替代损失函数和体积场景表示。我们发现互信息在姿态估计方面明显优于其他损失函数，避免了局部最优的陷入。可选的离散（CBCT）和神经（NeTT和mNeRF）体积场景表示产生类似的性能（3D角度误差，平均值≤3.2°，90%分位数≤3.4°）；然而，神经场景表示的训练需要大量的计算开销。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Single-View Fluoroscopic X-Ray Pose Estimation: A Comparison of Alternative Loss Functions and Volumetric Scene Representations.

Many tasks performed in image-guided procedures can be cast as pose estimation problems, where specific projections are chosen to reach a target in 3D space. In this study, we construct a framework for fluoroscopic pose estimation and compare alternative loss functions and volumetric scene representations. We first develop a differentiable projection (DiffProj) algorithm for the efficient computation of Digitally Reconstructed Radiographs (DRRs) from either Cone-Beam Computerized Tomography (CBCT) or neural scene representations. We introduce two innovative neural scene representations, Neural Tuned Tomography (NeTT) and masked Neural Radiance Fields (mNeRF). Pose estimation is then performed within the framework by iterative gradient descent using loss functions that quantify the image discrepancy of the synthesized DRR with respect to the ground-truth, target fluoroscopic X-ray image. We compared alternative loss functions and volumetric scene representations for pose estimation using a dataset consisting of 50 cranial tomographic X-ray sequences. We find that Mutual Information significantly outperforms alternative loss functions for pose estimation, avoiding entrapment in local optima. The alternative discrete (CBCT) and neural (NeTT and mNeRF) volumetric scene representations yield comparable performance (3D angle errors, mean ≤ 3.2° and 90% quantile ≤ 3.4°); however, the neural scene representations incur a considerable computational expense to train.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of imaging informatics in medicine

自引率

0.00%

发文量