Transformers in surgical artificial intelligence: A domain-stratified, study-level narrative review

IF 1.9

JTCVS open Pub Date : 2026-04-01 Epub Date: 2026-01-22 DOI:10.1016/j.xjon.2026.101597

Andres Bravo MBI, BScH, Sara Razzaq MD, Mohan Murari MS, Aditya Ahuja BSc, Danielle Birchett BS, Lana Schumacher MD

{"title":"Transformers in surgical artificial intelligence: A domain-stratified, study-level narrative review","authors":"Andres Bravo MBI, BScH, Sara Razzaq MD, Mohan Murari MS, Aditya Ahuja BSc, Danielle Birchett BS, Lana Schumacher MD","doi":"10.1016/j.xjon.2026.101597","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective</h3><div>The objective was to synthesize study-reported outcomes of transformer-based artificial intelligence systems in surgical domains and describe settings where they appear advantageous relative to nontransformer models and human benchmarks.</div></div><div><h3>Methods</h3><div>We searched major databases: PubMed, Embase, IEEE Xplore, ScienceDirect, Google Scholar, arXiv, and Cochrane Library. Eligible studies evaluated transformer architectures in surgical/perioperative contexts (medical imaging, workflow recognition, prognosis-related modeling, or education) and reported quantitative outcomes. Because of heterogeneous tasks/metrics, we performed a domain-organized narrative synthesis. Where the same study reported transformers and nontransformers on the same dataset/metrics, we computed within-study deltas (Δ = Transformers – Nontransformers) and summarized medians and interquartile ranges alongside vote counts (T>NT/tie/T<NT). No cross-study pooling or hypothesis testing was performed.</div></div><div><h3>Results</h3><div>Paired comparisons favored transformers in medical imaging for 15 of 20 (75.0%) with median Δ +1.13 percentage points (interquartile range, 3.70) and in workflow recognition for 28 of 34 (82.4%) with median Δ +1.75 percentage points (interquartile range, 3.28). Prognosis had sparse paired data (n = 1; Δ +3.0 percentage points; illustrative). Education favored transformers overall in 5 of 6 (83.3%) paired comparisons, driven by surgery time prediction; diagnostic education tasks were mixed. Reported advantages were task dependent and dataset specific; gains were typically single-digit percentage points in like-for-like settings.</div></div><div><h3>Conclusions</h3><div>Transformers frequently match or exceed nontransformer baselines in surgical imaging and workflow tasks, with promising, yet heterogeneously reported, signals in prognosis and education. Translation to dependable clinical/educational impact will require standardized benchmarks, external/prospective validation, transparent comparator reporting (including human baselines), and deployment studies that address real-time operating room constraints and fairness across patient and learner groups.</div></div>","PeriodicalId":74032,"journal":{"name":"JTCVS open","volume":"30 ","pages":"Article 101597"},"PeriodicalIF":1.9000,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JTCVS open","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666273626000203","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2026/1/22 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Objective

The objective was to synthesize study-reported outcomes of transformer-based artificial intelligence systems in surgical domains and describe settings where they appear advantageous relative to nontransformer models and human benchmarks.

Methods

We searched major databases: PubMed, Embase, IEEE Xplore, ScienceDirect, Google Scholar, arXiv, and Cochrane Library. Eligible studies evaluated transformer architectures in surgical/perioperative contexts (medical imaging, workflow recognition, prognosis-related modeling, or education) and reported quantitative outcomes. Because of heterogeneous tasks/metrics, we performed a domain-organized narrative synthesis. Where the same study reported transformers and nontransformers on the same dataset/metrics, we computed within-study deltas (Δ = Transformers – Nontransformers) and summarized medians and interquartile ranges alongside vote counts (T>NT/tie/T<NT). No cross-study pooling or hypothesis testing was performed.

Results

Paired comparisons favored transformers in medical imaging for 15 of 20 (75.0%) with median Δ +1.13 percentage points (interquartile range, 3.70) and in workflow recognition for 28 of 34 (82.4%) with median Δ +1.75 percentage points (interquartile range, 3.28). Prognosis had sparse paired data (n = 1; Δ +3.0 percentage points; illustrative). Education favored transformers overall in 5 of 6 (83.3%) paired comparisons, driven by surgery time prediction; diagnostic education tasks were mixed. Reported advantages were task dependent and dataset specific; gains were typically single-digit percentage points in like-for-like settings.

Conclusions

Transformers frequently match or exceed nontransformer baselines in surgical imaging and workflow tasks, with promising, yet heterogeneously reported, signals in prognosis and education. Translation to dependable clinical/educational impact will require standardized benchmarks, external/prospective validation, transparent comparator reporting (including human baselines), and deployment studies that address real-time operating room constraints and fairness across patient and learner groups.

查看原文本刊更多论文

外科人工智能中的变形：领域分层，研究级叙事回顾

目的是综合基于变压器的人工智能系统在外科领域的研究报告结果，并描述相对于非变压器模型和人类基准而言，它们似乎具有优势的设置。方法检索PubMed、Embase、IEEE explore、ScienceDirect、b谷歌Scholar、arXiv、Cochrane Library等主要数据库。符合条件的研究评估了手术/围手术期环境下的变压器结构（医学成像、工作流程识别、预后相关建模或教育），并报告了定量结果。由于异构的任务/指标，我们执行了一个领域组织的叙述综合。如果同一研究报告了相同数据集/指标上的变压器和非变压器，我们计算了研究内的delta （Δ =变压器-非变压器），并总结了投票计数的中位数和四分位数范围（T>NT/tie/T<；NT）。未进行交叉研究合并或假设检验。结果在西班牙的比较中，在医学成像方面，20人中有15人（75.0%）偏爱变压器，中位数为Δ +1.13个百分点（四分位差为3.70）；在工作流识别方面，34人中有28人（82.4%）偏爱变压器，中位数为Δ +1.75个百分点（四分位差为3.28）。预后为稀疏配对数据（n = 1； Δ +3.0个百分点；说明性）。受手术时间预测的影响，6个配对比较中有5个（83.3%）受教育程度的影响；诊断教育任务是混合的。报告的优势是任务依赖和数据集特定的；在同类环境下，收益通常是个位数的百分点。结论变压器患者在手术成像和工作流程任务方面经常达到或超过非变压器患者的基线，在预后和教育方面具有良好的信号，但报道不一。转化为可靠的临床/教育影响将需要标准化的基准、外部/前瞻性验证、透明的比较报告（包括人类基线），以及解决实时手术室约束和患者和学习者群体之间公平性的部署研究。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

JTCVS open

CiteScore

1.70

自引率

0.00%

发文量