Eui Jin Hwang, Jong Hyuk Lee, Woo Hyeon Lim, Won Gi Jeong, Wonju Hong, Jongsoo Park, Seung-Jin Yoo, Hyungjin Kim
求助PDF
{"title":"生成式人工智能胸片报告模型的临床验证:一项多队列研究。","authors":"Eui Jin Hwang, Jong Hyuk Lee, Woo Hyeon Lim, Won Gi Jeong, Wonju Hong, Jongsoo Park, Seung-Jin Yoo, Hyungjin Kim","doi":"10.1148/radiol.250568","DOIUrl":null,"url":null,"abstract":"<p><p>Background Artificial intelligence (AI)-generated radiology reports have become available and require rigorous evaluation. Purpose To evaluate the clinical acceptability of chest radiograph reports generated by an AI algorithm and their accuracy in identifying referable abnormalities. Materials and Methods Chest radiographs from an intensive care unit (ICU), an emergency department, and health checkups were retrospectively collected between January 2020 and December 2022, and outpatient chest radiographs were sourced from a public dataset. An automated report-generating AI algorithm was then applied. A panel of seven thoracic radiologists evaluated the acceptability of generated reports, and acceptability was analyzed using a standard criterion (acceptable without revision or with minor revision) and a stringent criterion (acceptable without revision). Using chest radiographs from three of the contexts (excluding the ICU), AI-generated and radiologist-written reports were compared regarding the acceptability of the reports (generalized linear mixed model) and their sensitivity and specificity for identifying referable abnormalities (McNemar test). The radiologist panel was surveyed to evaluate their perspectives on the potential of AI-generated reports to replace radiologist-written reports. Results The chest radiographs of 1539 individuals (median age, 55 years; 656 male patients, 483 female patients, 400 patients of unknown sex) were included. There was no evidence of a difference in acceptability between AI-generated and radiologist-written reports under the standard criterion (88.4% vs 89.2%; <i>P</i> = .36), but AI-generated reports were less acceptable than radiologist-written reports under the stringent criterion (66.8% vs 75.7%; <i>P</i> < .001). Compared with radiologist-written reports, AI-generated reports identified radiographs with referable abnormalities with greater sensitivity (81.2% vs 59.4%; <i>P</i> < .001) and lower specificity (81.0% vs 93.6%; <i>P</i> < .001). In the survey, most radiologists indicated that AI-generated reports were not yet reliable enough to replace radiologist-written reports. Conclusion AI-generated chest radiograph reports had similar acceptability to radiologist-written reports, although a substantial proportion of AI-generated reports required minor revision. © RSNA, 2025 <i>Supplemental material is available for this article.</i> See also the editorial by Wu and Seo in this issue.</p>","PeriodicalId":20896,"journal":{"name":"Radiology","volume":"316 3","pages":"e250568"},"PeriodicalIF":15.2000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Clinical Validation of a Generative Artificial Intelligence Model for Chest Radiograph Reporting: A Multicohort Study.\",\"authors\":\"Eui Jin Hwang, Jong Hyuk Lee, Woo Hyeon Lim, Won Gi Jeong, Wonju Hong, Jongsoo Park, Seung-Jin Yoo, Hyungjin Kim\",\"doi\":\"10.1148/radiol.250568\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Background Artificial intelligence (AI)-generated radiology reports have become available and require rigorous evaluation. Purpose To evaluate the clinical acceptability of chest radiograph reports generated by an AI algorithm and their accuracy in identifying referable abnormalities. Materials and Methods Chest radiographs from an intensive care unit (ICU), an emergency department, and health checkups were retrospectively collected between January 2020 and December 2022, and outpatient chest radiographs were sourced from a public dataset. An automated report-generating AI algorithm was then applied. A panel of seven thoracic radiologists evaluated the acceptability of generated reports, and acceptability was analyzed using a standard criterion (acceptable without revision or with minor revision) and a stringent criterion (acceptable without revision). Using chest radiographs from three of the contexts (excluding the ICU), AI-generated and radiologist-written reports were compared regarding the acceptability of the reports (generalized linear mixed model) and their sensitivity and specificity for identifying referable abnormalities (McNemar test). The radiologist panel was surveyed to evaluate their perspectives on the potential of AI-generated reports to replace radiologist-written reports. Results The chest radiographs of 1539 individuals (median age, 55 years; 656 male patients, 483 female patients, 400 patients of unknown sex) were included. There was no evidence of a difference in acceptability between AI-generated and radiologist-written reports under the standard criterion (88.4% vs 89.2%; <i>P</i> = .36), but AI-generated reports were less acceptable than radiologist-written reports under the stringent criterion (66.8% vs 75.7%; <i>P</i> < .001). Compared with radiologist-written reports, AI-generated reports identified radiographs with referable abnormalities with greater sensitivity (81.2% vs 59.4%; <i>P</i> < .001) and lower specificity (81.0% vs 93.6%; <i>P</i> < .001). In the survey, most radiologists indicated that AI-generated reports were not yet reliable enough to replace radiologist-written reports. Conclusion AI-generated chest radiograph reports had similar acceptability to radiologist-written reports, although a substantial proportion of AI-generated reports required minor revision. © RSNA, 2025 <i>Supplemental material is available for this article.</i> See also the editorial by Wu and Seo in this issue.</p>\",\"PeriodicalId\":20896,\"journal\":{\"name\":\"Radiology\",\"volume\":\"316 3\",\"pages\":\"e250568\"},\"PeriodicalIF\":15.2000,\"publicationDate\":\"2025-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Radiology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1148/radiol.250568\",\"RegionNum\":1,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Radiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1148/radiol.250568","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0
引用
批量引用