Andres Bravo MBI, BScH, Sara Razzaq MD, Mohan Murari MS, Aditya Ahuja BSc, Danielle Birchett BS, Lana Schumacher MD
{"title":"Transformers in surgical artificial intelligence: A domain-stratified, study-level narrative review","authors":"Andres Bravo MBI, BScH, Sara Razzaq MD, Mohan Murari MS, Aditya Ahuja BSc, Danielle Birchett BS, Lana Schumacher MD","doi":"10.1016/j.xjon.2026.101597","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective</h3><div>The objective was to synthesize study-reported outcomes of transformer-based artificial intelligence systems in surgical domains and describe settings where they appear advantageous relative to nontransformer models and human benchmarks.</div></div><div><h3>Methods</h3><div>We searched major databases: PubMed, Embase, IEEE Xplore, ScienceDirect, Google Scholar, arXiv, and Cochrane Library. Eligible studies evaluated transformer architectures in surgical/perioperative contexts (medical imaging, workflow recognition, prognosis-related modeling, or education) and reported quantitative outcomes. Because of heterogeneous tasks/metrics, we performed a domain-organized narrative synthesis. Where the same study reported transformers and nontransformers on the same dataset/metrics, we computed within-study deltas (Δ = Transformers – Nontransformers) and summarized medians and interquartile ranges alongside vote counts (T>NT/tie/T<NT). No cross-study pooling or hypothesis testing was performed.</div></div><div><h3>Results</h3><div>Paired comparisons favored transformers in medical imaging for 15 of 20 (75.0%) with median Δ +1.13 percentage points (interquartile range, 3.70) and in workflow recognition for 28 of 34 (82.4%) with median Δ +1.75 percentage points (interquartile range, 3.28). Prognosis had sparse paired data (n = 1; Δ +3.0 percentage points; illustrative). Education favored transformers overall in 5 of 6 (83.3%) paired comparisons, driven by surgery time prediction; diagnostic education tasks were mixed. Reported advantages were task dependent and dataset specific; gains were typically single-digit percentage points in like-for-like settings.</div></div><div><h3>Conclusions</h3><div>Transformers frequently match or exceed nontransformer baselines in surgical imaging and workflow tasks, with promising, yet heterogeneously reported, signals in prognosis and education. Translation to dependable clinical/educational impact will require standardized benchmarks, external/prospective validation, transparent comparator reporting (including human baselines), and deployment studies that address real-time operating room constraints and fairness across patient and learner groups.</div></div>","PeriodicalId":74032,"journal":{"name":"JTCVS open","volume":"30 ","pages":"Article 101597"},"PeriodicalIF":1.9000,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JTCVS open","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666273626000203","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2026/1/22 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Objective
The objective was to synthesize study-reported outcomes of transformer-based artificial intelligence systems in surgical domains and describe settings where they appear advantageous relative to nontransformer models and human benchmarks.
Methods
We searched major databases: PubMed, Embase, IEEE Xplore, ScienceDirect, Google Scholar, arXiv, and Cochrane Library. Eligible studies evaluated transformer architectures in surgical/perioperative contexts (medical imaging, workflow recognition, prognosis-related modeling, or education) and reported quantitative outcomes. Because of heterogeneous tasks/metrics, we performed a domain-organized narrative synthesis. Where the same study reported transformers and nontransformers on the same dataset/metrics, we computed within-study deltas (Δ = Transformers – Nontransformers) and summarized medians and interquartile ranges alongside vote counts (T>NT/tie/T<NT). No cross-study pooling or hypothesis testing was performed.
Results
Paired comparisons favored transformers in medical imaging for 15 of 20 (75.0%) with median Δ +1.13 percentage points (interquartile range, 3.70) and in workflow recognition for 28 of 34 (82.4%) with median Δ +1.75 percentage points (interquartile range, 3.28). Prognosis had sparse paired data (n = 1; Δ +3.0 percentage points; illustrative). Education favored transformers overall in 5 of 6 (83.3%) paired comparisons, driven by surgery time prediction; diagnostic education tasks were mixed. Reported advantages were task dependent and dataset specific; gains were typically single-digit percentage points in like-for-like settings.
Conclusions
Transformers frequently match or exceed nontransformer baselines in surgical imaging and workflow tasks, with promising, yet heterogeneously reported, signals in prognosis and education. Translation to dependable clinical/educational impact will require standardized benchmarks, external/prospective validation, transparent comparator reporting (including human baselines), and deployment studies that address real-time operating room constraints and fairness across patient and learner groups.