{"title":"基于供应商不确定视觉转换器的人工智能用于经口胆道镜检查:与卷积神经网络和内窥镜医师比较胆道狭窄的诊断性能。","authors":"Ryosuke Sato, Kazuyuki Matsumoto, Masahiro Tomiya, Takayoshi Tanimoto, Akimitsu Ohto, Kentaro Oki, Satoshi Kajitani, Tatsuya Kikuchi, Akihiro Matsumi, Kazuya Miyamoto, Yuki Fujii, Daisuke Uchida, Koichiro Tsutsumi, Shigeru Horiguchi, Yoshiro Kawahara, Motoyuki Otsuka","doi":"10.1111/den.70028","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>Accurate diagnosis of biliary strictures remains challenging. This study aimed to develop an artificial intelligence (AI) system for peroral cholangioscopy (POCS) using a Vision Transformer (ViT) architecture and to evaluate its performance compared to different vendor devices, conventional convolutional neural networks (CNNs), and endoscopists.</p><p><strong>Methods: </strong>We retrospectively analyzed 125 patients with indeterminate biliary strictures who underwent POCS between 2012 and 2024. AI models including the ViT architecture and two established CNN architectures were developed using images from CHF-B260 or B290 (CHF group; Olympus Medical) and SpyScope DS or DS II (Spy group; Boston Scientific) systems via a patient-level, 3-fold cross-validation. For a direct comparison against endoscopists, a balanced 440-image test set, containing an equal number of images from each vendor, was used for a blinded evaluation.</p><p><strong>Results: </strong>The 3-fold cross-validation on the entire 2062-image dataset yielded a robust accuracy of 83.9% (95% confidence interval (CI), 80.9-86.7) for the ViT model. The model's accuracy was consistent between CHF (82.7%) and Spy (86.8%, p = 0.198) groups, and its performance was comparable to the evaluated conventional CNNs. On the 440-image test set, the ViT's accuracy of 78.4% (95% CI, 72.5-83.8) was comparable to that of expert endoscopists (82.0%, p = 0.148) and non-experts (73.0%, p = 0.066), with no statistically significant differences observed.</p><p><strong>Conclusions: </strong>The novel ViT-based AI model demonstrated high vendor-agnostic diagnostic accuracy across multiple POCS systems, achieving performance comparable to conventional CNNs and endoscopists evaluated in this study.</p>","PeriodicalId":72813,"journal":{"name":"Digestive endoscopy : official journal of the Japan Gastroenterological Endoscopy Society","volume":" ","pages":""},"PeriodicalIF":4.7000,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Vendor-Agnostic Vision Transformer-Based Artificial Intelligence for Peroral Cholangioscopy: Diagnostic Performance in Biliary Strictures Compared With Convolutional Neural Networks and Endoscopists.\",\"authors\":\"Ryosuke Sato, Kazuyuki Matsumoto, Masahiro Tomiya, Takayoshi Tanimoto, Akimitsu Ohto, Kentaro Oki, Satoshi Kajitani, Tatsuya Kikuchi, Akihiro Matsumi, Kazuya Miyamoto, Yuki Fujii, Daisuke Uchida, Koichiro Tsutsumi, Shigeru Horiguchi, Yoshiro Kawahara, Motoyuki Otsuka\",\"doi\":\"10.1111/den.70028\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Objectives: </strong>Accurate diagnosis of biliary strictures remains challenging. This study aimed to develop an artificial intelligence (AI) system for peroral cholangioscopy (POCS) using a Vision Transformer (ViT) architecture and to evaluate its performance compared to different vendor devices, conventional convolutional neural networks (CNNs), and endoscopists.</p><p><strong>Methods: </strong>We retrospectively analyzed 125 patients with indeterminate biliary strictures who underwent POCS between 2012 and 2024. AI models including the ViT architecture and two established CNN architectures were developed using images from CHF-B260 or B290 (CHF group; Olympus Medical) and SpyScope DS or DS II (Spy group; Boston Scientific) systems via a patient-level, 3-fold cross-validation. For a direct comparison against endoscopists, a balanced 440-image test set, containing an equal number of images from each vendor, was used for a blinded evaluation.</p><p><strong>Results: </strong>The 3-fold cross-validation on the entire 2062-image dataset yielded a robust accuracy of 83.9% (95% confidence interval (CI), 80.9-86.7) for the ViT model. The model's accuracy was consistent between CHF (82.7%) and Spy (86.8%, p = 0.198) groups, and its performance was comparable to the evaluated conventional CNNs. On the 440-image test set, the ViT's accuracy of 78.4% (95% CI, 72.5-83.8) was comparable to that of expert endoscopists (82.0%, p = 0.148) and non-experts (73.0%, p = 0.066), with no statistically significant differences observed.</p><p><strong>Conclusions: </strong>The novel ViT-based AI model demonstrated high vendor-agnostic diagnostic accuracy across multiple POCS systems, achieving performance comparable to conventional CNNs and endoscopists evaluated in this study.</p>\",\"PeriodicalId\":72813,\"journal\":{\"name\":\"Digestive endoscopy : official journal of the Japan Gastroenterological Endoscopy Society\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":4.7000,\"publicationDate\":\"2025-09-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Digestive endoscopy : official journal of the Japan Gastroenterological Endoscopy Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1111/den.70028\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digestive endoscopy : official journal of the Japan Gastroenterological Endoscopy Society","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1111/den.70028","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Vendor-Agnostic Vision Transformer-Based Artificial Intelligence for Peroral Cholangioscopy: Diagnostic Performance in Biliary Strictures Compared With Convolutional Neural Networks and Endoscopists.
Objectives: Accurate diagnosis of biliary strictures remains challenging. This study aimed to develop an artificial intelligence (AI) system for peroral cholangioscopy (POCS) using a Vision Transformer (ViT) architecture and to evaluate its performance compared to different vendor devices, conventional convolutional neural networks (CNNs), and endoscopists.
Methods: We retrospectively analyzed 125 patients with indeterminate biliary strictures who underwent POCS between 2012 and 2024. AI models including the ViT architecture and two established CNN architectures were developed using images from CHF-B260 or B290 (CHF group; Olympus Medical) and SpyScope DS or DS II (Spy group; Boston Scientific) systems via a patient-level, 3-fold cross-validation. For a direct comparison against endoscopists, a balanced 440-image test set, containing an equal number of images from each vendor, was used for a blinded evaluation.
Results: The 3-fold cross-validation on the entire 2062-image dataset yielded a robust accuracy of 83.9% (95% confidence interval (CI), 80.9-86.7) for the ViT model. The model's accuracy was consistent between CHF (82.7%) and Spy (86.8%, p = 0.198) groups, and its performance was comparable to the evaluated conventional CNNs. On the 440-image test set, the ViT's accuracy of 78.4% (95% CI, 72.5-83.8) was comparable to that of expert endoscopists (82.0%, p = 0.148) and non-experts (73.0%, p = 0.066), with no statistically significant differences observed.
Conclusions: The novel ViT-based AI model demonstrated high vendor-agnostic diagnostic accuracy across multiple POCS systems, achieving performance comparable to conventional CNNs and endoscopists evaluated in this study.