{"title":"Real-time ultrasound AR 3D visualization toward better topological structure perception for hepatobiliary surgery.","authors":"Yuqi Ji, Tianqi Huang, Yutong Wu, Ruiyang Li, Pengfei Wang, Jiahong Dong, Honegen Liao","doi":"10.1007/s11548-024-03273-1","DOIUrl":"https://doi.org/10.1007/s11548-024-03273-1","url":null,"abstract":"<p><strong>Purpose: </strong>Ultrasound serves as a crucial intraoperative imaging tool for hepatobiliary surgeons, enabling the identification of complex anatomical structures like blood vessels, bile ducts, and lesions. However, the reliance on manual mental reconstruction of 3D topologies from 2D ultrasound images presents significant challenges, leading to a pressing need for tools to assist surgeons with real-time identification of 3D topological anatomy.</p><p><strong>Methods: </strong>We propose a real-time ultrasound AR 3D visualization method for intraoperative 2D ultrasound imaging. Our system leverages backward alpha blending to integrate multi-planar ultrasound data effectively. To ensure continuity between 2D ultrasound planes, we employ spatial smoothing techniques to interpolate the widely spaced ultrasound planes. A dynamic 3D transfer function is also developed to enhance spatial representation through color differentiation.</p><p><strong>Results: </strong>Comparative experiments involving our AR visualization of 3D ultrasound, alongside AR visualization of 2D ultrasound and 2D visualization of 3D ultrasound, demonstrated that the proposed method significantly reduced operational time(110.25 ± 27.83 s compared to 292 ± 146.63 s and 365.25 ± 131.62 s), improved depth perception and comprehension of complex topologies, contributing to reduced pressure and increased personal satisfaction among users.</p><p><strong>Conclusion: </strong>Quantitative experimental results and feedback from both novice and experienced physicians highlight our system's exceptional ability to enhance the understanding of complex topological anatomy. This improvement is crucial for accurate ultrasound diagnosis and informed surgical decision-making, underscoring the system's clinical applicability.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2024-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142480228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Srdjan Milosavljevic, Zoltan Bardosi, Yusuf Oezbek, Wolfgang Freysinger
{"title":"Adaptive infrared patterns for microscopic surface reconstructions.","authors":"Srdjan Milosavljevic, Zoltan Bardosi, Yusuf Oezbek, Wolfgang Freysinger","doi":"10.1007/s11548-024-03242-8","DOIUrl":"https://doi.org/10.1007/s11548-024-03242-8","url":null,"abstract":"<p><strong>Purpose: </strong>Multi-zoom microscopic surface reconstructions of operating sites, especially in ENT surgeries, would allow multimodal image fusion for determining the amount of resected tissue, for recognizing critical structures, and novel tools for intraoperative quality assurance. State-of-the-art three-dimensional model creation of the surgical scene is challenged by the surgical environment, illumination, and the homogeneous structures of skin, muscle, bones, etc., that lack invariant features for stereo reconstruction.</p><p><strong>Methods: </strong>An adaptive near-infrared pattern projector illuminates the surgical scene with optimized patterns to yield accurate dense multi-zoom stereoscopic surface reconstructions. The approach does not impact the clinical workflow. The new method is compared to state-of-the-art approaches and is validated by determining its reconstruction errors relative to a high-resolution 3D-reconstruction of CT data.</p><p><strong>Results: </strong>200 surface reconstructions were generated for 5 zoom levels with 10 reconstructions for each object illumination method (standard operating room light, microscope light, random pattern and adaptive NIR pattern). For the adaptive pattern, the surface reconstruction errors ranged from 0.5 to 0.7 mm, as compared to 1-1.9 mm for the other approaches. The local reconstruction differences are visualized in heat maps.</p><p><strong>Conclusion: </strong>Adaptive near-infrared (NIR) pattern projection in microscopic surgery allows dense and accurate microscopic surface reconstructions for variable zoom levels of small and homogeneous surfaces. This could potentially aid in microscopic interventions at the lateral skull base and potentially open up new possibilities for combining quantitative intraoperative surface reconstructions with preoperative radiologic imagery.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142395053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alexander Geiger, Lukas Bernhard, Florian Gassert, Hubertus Feußner, Dirk Wilhelm, Helmut Friess, Alissa Jell
{"title":"Towards multimodal visualization of esophageal motility: fusion of manometry, impedance, and videofluoroscopic image sequences.","authors":"Alexander Geiger, Lukas Bernhard, Florian Gassert, Hubertus Feußner, Dirk Wilhelm, Helmut Friess, Alissa Jell","doi":"10.1007/s11548-024-03265-1","DOIUrl":"https://doi.org/10.1007/s11548-024-03265-1","url":null,"abstract":"<p><strong>Purpose: </strong>Dysphagia is the inability or difficulty to swallow normally. Standard procedures for diagnosing the exact disease are, among others, X-ray videofluoroscopy, manometry and impedance examinations, usually performed consecutively. In order to gain more insights, ongoing research is aiming to collect these different modalities at the same time, with the goal to present them in a joint visualization. One idea to create a combined view is the projection of the manometry and impedance values onto the right location in the X-ray images. This requires to identify the exact sensor locations in the images.</p><p><strong>Methods: </strong>This work gives an overview of the challenges associated with the sensor detection task and proposes a robust approach to detect the sensors in X-ray image sequences, ultimately allowing to project the manometry and impedance values onto the right location in the images.</p><p><strong>Results: </strong>The developed sensor detection approach is evaluated on a total of 14 sequences from different patients, achieving a F1-score of 86.36%. To demonstrate the robustness of the approach, another study is performed by adding different levels of noise to the images, with the performance of our sensor detection method only slightly decreasing in these scenarios. This robust sensor detection provides the basis to accurately project manometry and impedance values onto the images, allowing to create a multimodal visualization of the swallow process. The resulting visualizations are evaluated qualitatively by domain experts, indicating a great benefit of this proposed fused visualization approach.</p><p><strong>Conclusion: </strong>Using our preprocessing and sensor detection method, we show that the sensor detection task can be successfully approached with high accuracy. This allows to create a novel, multimodal visualization of esophageal motility, helping to provide more insights into swallow disorders of patients.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2024-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142395054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Luke G Johnson, Joseph D Mozingo, Penny R Atkins, Seaton Schwab, Alan Morris, Shireen Y Elhabian, David R Wilson, Harry K W Kim, Andrew E Anderson
{"title":"A framework for three-dimensional statistical shape modeling of the proximal femur in Legg-Calvé-Perthes disease.","authors":"Luke G Johnson, Joseph D Mozingo, Penny R Atkins, Seaton Schwab, Alan Morris, Shireen Y Elhabian, David R Wilson, Harry K W Kim, Andrew E Anderson","doi":"10.1007/s11548-024-03272-2","DOIUrl":"10.1007/s11548-024-03272-2","url":null,"abstract":"<p><strong>Purpose: </strong>The pathomorphology of Legg-Calvé-Perthes disease (LCPD) is a key contributor to poor long-term outcomes such as hip pain, femoroacetabular impingement, and early-onset osteoarthritis. Plain radiographs, commonly used for research and in the clinic, cannot accurately represent the full extent of LCPD deformity. The purpose of this study was to develop and evaluate a methodological framework for three-dimensional (3D) statistical shape modeling (SSM) of the proximal femur in LCPD.</p><p><strong>Methods: </strong>We developed a framework consisting of three core steps: segmentation, surface mesh preparation, and particle-based correspondence. The framework aims to address challenges in modeling this rare condition, characterized by highly heterogeneous deformities across a wide age range and small sample sizes. We evaluated this framework by producing a SSM from clinical magnetic resonance images of 13 proximal femurs with LCPD deformity from 11 patients between the ages of six and 12 years.</p><p><strong>Results: </strong>After removing differences in scale and pose, the dominant shape modes described morphological features characteristic of LCPD, including a broad and flat femoral head, high-riding greater trochanter, and reduced neck-shaft angle. The first four shape modes were chosen for the evaluation of the model's performance, together describing 87.5% of the overall cohort variance. The SSM was generalizable to unfamiliar examples with an average point-to-point reconstruction error below 1mm. We observed strong Spearman rank correlations (up to 0.79) between some shape modes, 3D measurements of femoral head asphericity, and clinical radiographic metrics.</p><p><strong>Conclusion: </strong>In this study, we present a framework, based on SSM, for the objective description of LCPD deformity in three dimensions. Our methods can accurately describe overall shape variation using a small number of parameters, and are a step toward a widely accepted, objective 3D quantification of LCPD deformity.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2024-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142395108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rohit Dey, Yichen Guo, Yang Liu, Ajit Puri, Luis Savastano, Yihao Zheng
{"title":"An intuitive guidewire control mechanism for robotic intervention.","authors":"Rohit Dey, Yichen Guo, Yang Liu, Ajit Puri, Luis Savastano, Yihao Zheng","doi":"10.1007/s11548-024-03279-9","DOIUrl":"https://doi.org/10.1007/s11548-024-03279-9","url":null,"abstract":"<p><strong>Purpose: </strong>Teleoperated Interventional Robotic systems (TIRs) are developed to reduce radiation exposure and physical stress of the physicians and enhance device manipulation accuracy and stability. Nevertheless, TIRs are not widely adopted, partly due to the lack of intuitive control interfaces. Current TIR interfaces like joysticks, keyboards, and touchscreens differ significantly from traditional manual techniques, resulting in a shallow, longer learning curve. To this end, this research introduces a novel control mechanism for intuitive operation and seamless adoption of TIRs.</p><p><strong>Methods: </strong>An off-the-shelf medical torque device augmented with a micro-electromagnetic tracker was proposed as the control interface to preserve the tactile sensation and muscle memory integral to interventionalists' proficiency. The control inputs to drive the TIR were extracted via real-time motion mapping of the interface. To verify the efficacy of the proposed control mechanism to accurately operate the TIR, evaluation experiments using industrial grade encoders were conducted.</p><p><strong>Results: </strong>A mean tracking error of 0.32 ± 0.12 mm in linear and 0.54 ± 0.07° in angular direction were achieved. The time lag in tracking was found to be 125 ms on average using pade approximation. Ergonomically, the developed control interface is 3.5 mm diametrically larger, and 4.5 g. heavier compared to traditional torque devices.</p><p><strong>Conclusion: </strong>With uncanny resemblance to traditional torque devices while maintaining results comparable to state-of-the-art commercially available TIRs, this research successfully provides an intuitive control interface for potential wider clinical adoption of robot-assisted interventions.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2024-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142382319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Luis Carlos Rivera Monroy, Leonhard Rist, Christian Ostalecki, Andreas Bauer, Julio Vera, Katharina Breininger, Andreas Maier
{"title":"Graph neural networks in multi-stained pathological imaging: extended comparative analysis of Radiomic features.","authors":"Luis Carlos Rivera Monroy, Leonhard Rist, Christian Ostalecki, Andreas Bauer, Julio Vera, Katharina Breininger, Andreas Maier","doi":"10.1007/s11548-024-03277-x","DOIUrl":"https://doi.org/10.1007/s11548-024-03277-x","url":null,"abstract":"<p><strong>Purpose: </strong>This study investigates the application of Radiomic features within graph neural networks (GNNs) for the classification of multiple-epitope-ligand cartography (MELC) pathology samples. It aims to enhance the diagnosis of often misdiagnosed skin diseases such as eczema, lymphoma, and melanoma. The novel contribution lies in integrating Radiomic features with GNNs and comparing their efficacy against traditional multi-stain profiles.</p><p><strong>Methods: </strong>We utilized GNNs to process multiple pathological slides as cell-level graphs, comparing their performance with XGBoost and Random Forest classifiers. The analysis included two feature types: multi-stain profiles and Radiomic features. Dimensionality reduction techniques such as UMAP and t-SNE were applied to optimize the feature space, and graph connectivity was based on spatial and feature closeness.</p><p><strong>Results: </strong>Integrating Radiomic features into spatially connected graphs significantly improved classification accuracy over traditional models. The application of UMAP further enhanced the performance of GNNs, particularly in classifying diseases with similar pathological features. The GNN model outperformed baseline methods, demonstrating its robustness in handling complex histopathological data.</p><p><strong>Conclusion: </strong>Radiomic features processed through GNNs show significant promise for multi-disease classification, improving diagnostic accuracy. This study's findings suggest that integrating advanced imaging analysis with graph-based modeling can lead to better diagnostic tools. Future research should expand these methods to a wider range of diseases to validate their generalizability and effectiveness.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2024-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142382320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lifeng Zhu, Jianwei Zheng, Cheng Wang, Junhong Jiang, Aiguo Song
{"title":"A bronchoscopic navigation method based on neural radiation fields.","authors":"Lifeng Zhu, Jianwei Zheng, Cheng Wang, Junhong Jiang, Aiguo Song","doi":"10.1007/s11548-024-03243-7","DOIUrl":"10.1007/s11548-024-03243-7","url":null,"abstract":"<p><strong>Purpose: </strong>We introduce a novel approach for bronchoscopic navigation that leverages neural radiance fields (NeRF) to passively locate the endoscope solely from bronchoscopic images. This approach aims to overcome the limitations and challenges of current bronchoscopic navigation tools that rely on external infrastructures or require active adjustment of the bronchoscope.</p><p><strong>Methods: </strong>To address the challenges, we leverage NeRF for bronchoscopic navigation, enabling passive endoscope localization from bronchoscopic images. We develop a two-stage pipeline: offline training using preoperative data and online passive pose estimation during surgery. To enhance performance, we employ Anderson acceleration and incorporate semantic appearance transfer to deal with the sim-to-real gap between training and inference stages.</p><p><strong>Results: </strong>We assessed the viability of our approach by conducting tests on virtual bronchscopic images and a physical phantom against the SLAM-based methods. The average rotation error in our virtual dataset is about 3.18 <math><mmultiscripts><mrow></mrow> <mrow></mrow> <mo>∘</mo></mmultiscripts> </math> and the translation error is around 4.95 mm. On the physical phantom test, the average rotation and translation error are approximately 5.14 <math><mmultiscripts><mrow></mrow> <mrow></mrow> <mo>∘</mo></mmultiscripts> </math> and 13.12 mm.</p><p><strong>Conclusion: </strong>Our NeRF-based bronchoscopic navigation method eliminates reliance on external infrastructures and active adjustments, offering promising advancements in bronchoscopic navigation. Experimental validation on simulation and real-world phantom models demonstrates its efficacy in addressing challenges like low texture and challenging lighting conditions.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141903518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Negar Kazemipour, Amir Hooshiar, Marta Kersten-Oertel
{"title":"A usability analysis of augmented reality and haptics for surgical planning.","authors":"Negar Kazemipour, Amir Hooshiar, Marta Kersten-Oertel","doi":"10.1007/s11548-024-03207-x","DOIUrl":"10.1007/s11548-024-03207-x","url":null,"abstract":"<p><strong>Purpose: </strong>Proper visualization and interaction with complex anatomical data can improve understanding, allowing for more intuitive surgical planning. The goal of our work was to study what the most intuitive yet practical platforms for interacting with 3D medical data are in the context of surgical planning.</p><p><strong>Methods: </strong>We compared planning using a monitor and mouse, a monitor with a haptic device, and an augmented reality (AR) head-mounted display which uses a gesture-based interaction. To determine the most intuitive system, two user studies, one with novices and one with experts, were conducted. The studies involved planning of three scenarios: (1) heart valve repair, (2) hip tumor resection, and (3) pedicle screw placement. Task completion time, NASA Task Load Index and system-specific questionnaires were used for the evaluation.</p><p><strong>Results: </strong>Both novices and experts preferred the AR system for pedicle screw placement. Novices preferred the haptic system for hip tumor planning, while experts preferred the mouse and keyboard. In the case of heart valve planning, novices preferred the AR system but there was no clear preference for experts. Both groups reported that AR provides the best spatial depth perception.</p><p><strong>Conclusion: </strong>The results of the user studies suggest that different surgical cases may benefit from varying interaction and visualization methods. For example, for planning surgeries with implants and instrumentations, mixed reality could provide better 3D spatial perception, whereas using landmarks for delineating specific targets may be more effective using a traditional 2D interface.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141472451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sarah Grube, Sarah Latus, Finn Behrendt, Oleksandra Riabova, Maximilian Neidhardt, Alexander Schlaefer
{"title":"Needle tracking in low-resolution ultrasound volumes using deep learning.","authors":"Sarah Grube, Sarah Latus, Finn Behrendt, Oleksandra Riabova, Maximilian Neidhardt, Alexander Schlaefer","doi":"10.1007/s11548-024-03234-8","DOIUrl":"10.1007/s11548-024-03234-8","url":null,"abstract":"<p><strong>Purpose: </strong>Clinical needle insertion into tissue, commonly assisted by 2D ultrasound imaging for real-time navigation, faces the challenge of precise needle and probe alignment to reduce out-of-plane movement. Recent studies investigate 3D ultrasound imaging together with deep learning to overcome this problem, focusing on acquiring high-resolution images to create optimal conditions for needle tip detection. However, high-resolution also requires a lot of time for image acquisition and processing, which limits the real-time capability. Therefore, we aim to maximize the US volume rate with the trade-off of low image resolution. We propose a deep learning approach to directly extract the 3D needle tip position from sparsely sampled US volumes.</p><p><strong>Methods: </strong>We design an experimental setup with a robot inserting a needle into water and chicken liver tissue. In contrast to manual annotation, we assess the needle tip position from the known robot pose. During insertion, we acquire a large data set of low-resolution volumes using a 16 <math><mo>×</mo></math> 16 element matrix transducer with a volume rate of 4 Hz. We compare the performance of our deep learning approach with conventional needle segmentation.</p><p><strong>Results: </strong>Our experiments in water and liver show that deep learning outperforms the conventional approach while achieving sub-millimeter accuracy. We achieve mean position errors of 0.54 mm in water and 1.54 mm in liver for deep learning.</p><p><strong>Conclusion: </strong>Our study underlines the strengths of deep learning to predict the 3D needle positions from low-resolution ultrasound volumes. This is an important milestone for real-time needle navigation, simplifying the alignment of needle and ultrasound probe and enabling a 3D motion analysis.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11442564/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141604463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Georg Hille, Pavan Tummala, Lena Spitz, Sylvia Saalfeld
{"title":"Transformers for colorectal cancer segmentation in CT imaging.","authors":"Georg Hille, Pavan Tummala, Lena Spitz, Sylvia Saalfeld","doi":"10.1007/s11548-024-03217-9","DOIUrl":"10.1007/s11548-024-03217-9","url":null,"abstract":"<p><strong>Purpose: </strong>Most recently transformer models became the state of the art in various medical image segmentation tasks and challenges, outperforming most of the conventional deep learning approaches. Picking up on that trend, this study aims at applying various transformer models to the highly challenging task of colorectal cancer (CRC) segmentation in CT imaging and assessing how they hold up to the current state-of-the-art convolutional neural network (CNN), the nnUnet. Furthermore, we wanted to investigate the impact of the network size on the resulting accuracies, since transformer models tend to be significantly larger than conventional network architectures.</p><p><strong>Methods: </strong>For this purpose, six different transformer models, with specific architectural advancements and network sizes were implemented alongside the aforementioned nnUnet and were applied to the CRC segmentation task of the medical segmentation decathlon.</p><p><strong>Results: </strong>The best results were achieved with the Swin-UNETR, D-Former, and VT-Unet, each transformer models, with a Dice similarity coefficient (DSC) of 0.60, 0.59 and 0.59, respectively. Therefore, the current state-of-the-art CNN, the nnUnet could be outperformed by transformer architectures regarding this task. Furthermore, a comparison with the inter-observer variability (IOV) of approx. 0.64 DSC indicates almost expert-level accuracy. The comparatively low IOV emphasizes the complexity and challenge of CRC segmentation, as well as indicating limitations regarding the achievable segmentation accuracy.</p><p><strong>Conclusion: </strong>As a result of this study, transformer models underline their current upward trend in producing state-of-the-art results also for the challenging task of CRC segmentation. However, with ever smaller advances in total accuracies, as demonstrated in this study by the on par performances of multiple network variants, other advantages like efficiency, low computation demands, or ease of adaption to new tasks become more and more relevant.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141535882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}