{"title":"[Deep Learning Approaches to Address the Shortage of Observers].","authors":"Nariaki Tabata, Tetsuya Ijichi, Masaya Tominaga, Kazunori Kitajima, Shuto Okaba, Lisa Sonoda, Shinichi Katou, Tomoya Masumoto, Asami Obata, Yuna Kawahara, Toshirou Inoue, Tadamitsu Ideguchi","doi":"10.6009/jjrt.25-1554","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>This study developed a deep learning-based artificial intelligence (AI) observer to address the shortage of skilled human observers and evaluated the impact of substituting human observers with AI.</p><p><strong>Methods: </strong>We used a CT system (Aquilion Prime SP; Canon Medical Systems, Tochigi) and modules CTP682 and CTP712 to scan the phantom (Catphan 700; Toyo Medic, Tokyo). The imaging conditions were set to a tube voltage of 120 kV and tube currents of 200, 160, 120, 80, 40, and 20 mA. Each condition was scanned twice, resulting in a total of 24 images. After the paired comparison experiment with 5 observers, deep learning models based on VGG19 and VGG16 were trained. We evaluated the variance, including both human and AI observers, and examined the impact of replacing humans with AI on the average degree of preference and statistical significance. These evaluations were conducted both when the training and assessments were from the same module and when they were from different modules.</p><p><strong>Results: </strong>Variance ranged from 0.085 to 0.177 (mean: 0.124). Despite using different modules for training and evaluation, the variance remained consistent, indicating that the results are independent of the training data. The average degree of preference and image rankings were nearly identical. Between 200 mA and 160 mA, AI results differed from human results in terms of statistical significance, though the difference was minimal. The discrepancy arose from differences in observations between humans and AI, yet it fell within the expected range of variation typically observed among human observers.</p><p><strong>Conclusion: </strong>Our results suggest that replacing human observers with AI has a minimal impact and may help alleviate observer shortages. The main limitation is the inability to modify evaluation criteria or stages with the trained models.</p>","PeriodicalId":74309,"journal":{"name":"Nihon Hoshasen Gijutsu Gakkai zasshi","volume":"81 7","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nihon Hoshasen Gijutsu Gakkai zasshi","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.6009/jjrt.25-1554","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose: This study developed a deep learning-based artificial intelligence (AI) observer to address the shortage of skilled human observers and evaluated the impact of substituting human observers with AI.
Methods: We used a CT system (Aquilion Prime SP; Canon Medical Systems, Tochigi) and modules CTP682 and CTP712 to scan the phantom (Catphan 700; Toyo Medic, Tokyo). The imaging conditions were set to a tube voltage of 120 kV and tube currents of 200, 160, 120, 80, 40, and 20 mA. Each condition was scanned twice, resulting in a total of 24 images. After the paired comparison experiment with 5 observers, deep learning models based on VGG19 and VGG16 were trained. We evaluated the variance, including both human and AI observers, and examined the impact of replacing humans with AI on the average degree of preference and statistical significance. These evaluations were conducted both when the training and assessments were from the same module and when they were from different modules.
Results: Variance ranged from 0.085 to 0.177 (mean: 0.124). Despite using different modules for training and evaluation, the variance remained consistent, indicating that the results are independent of the training data. The average degree of preference and image rankings were nearly identical. Between 200 mA and 160 mA, AI results differed from human results in terms of statistical significance, though the difference was minimal. The discrepancy arose from differences in observations between humans and AI, yet it fell within the expected range of variation typically observed among human observers.
Conclusion: Our results suggest that replacing human observers with AI has a minimal impact and may help alleviate observer shortages. The main limitation is the inability to modify evaluation criteria or stages with the trained models.