人类与人工智能在检测局灶性皮质发育不良方面的定量比较。

IF 8 1区医学 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING

Investigative Radiology Pub Date : 2025-04-01 Epub Date: 2024-10-23 DOI:10.1097/RLI.0000000000001125

Lennart Walger, Tobias Bauer, David Kügler, Matthias H Schmitz, Fabiane Schuch, Christophe Arendt, Tobias Baumgartner, Johannes Birkenheier, Valeri Borger, Christoph Endler, Franziska Grau, Christian Immanuel, Markus Kölle, Patrick Kupczyk, Asadeh Lakghomi, Sarah Mackert, Elisabeth Neuhaus, Julia Nordsiek, Anna-Maria Odenthal, Karmele Olaciregui Dague, Laura Ostermann, Jan Pukropski, Attila Racz, Klaus von der Ropp, Frederic Carsten Schmeel, Felix Schrader, Aileen Sitter, Alexander Unruh-Pinheiro, Marilia Voigt, Martin Vychopen, Philip von Wedel, Randi von Wrede, Ulrike Attenberger, Hartmut Vatter, Alexandra Philipsen, Albert Becker, Martin Reuter, Elke Hattingen, Josemir W Sander, Alexander Radbruch, Rainer Surges, Theodor Rüber

{"title":"人类与人工智能在检测局灶性皮质发育不良方面的定量比较。","authors":"Lennart Walger, Tobias Bauer, David Kügler, Matthias H Schmitz, Fabiane Schuch, Christophe Arendt, Tobias Baumgartner, Johannes Birkenheier, Valeri Borger, Christoph Endler, Franziska Grau, Christian Immanuel, Markus Kölle, Patrick Kupczyk, Asadeh Lakghomi, Sarah Mackert, Elisabeth Neuhaus, Julia Nordsiek, Anna-Maria Odenthal, Karmele Olaciregui Dague, Laura Ostermann, Jan Pukropski, Attila Racz, Klaus von der Ropp, Frederic Carsten Schmeel, Felix Schrader, Aileen Sitter, Alexander Unruh-Pinheiro, Marilia Voigt, Martin Vychopen, Philip von Wedel, Randi von Wrede, Ulrike Attenberger, Hartmut Vatter, Alexandra Philipsen, Albert Becker, Martin Reuter, Elke Hattingen, Josemir W Sander, Alexander Radbruch, Rainer Surges, Theodor Rüber","doi":"10.1097/RLI.0000000000001125","DOIUrl":null,"url":null,"abstract":"Objectives: Artificial intelligence (AI) is thought to improve lesion detection. However, a lack of knowledge about human performance prevents a comparative evaluation of AI and an accurate assessment of its impact on clinical decision-making. The objective of this work is to quantitatively evaluate the ability of humans to detect focal cortical dysplasia (FCD), compare it to state-of-the-art AI, and determine how it may aid diagnostics.Materials and methods: We prospectively recorded the performance of readers in detecting FCDs using single points and 3-dimensional bounding boxes. We acquired predictions of 3 AI models for the same dataset and compared these to readers. Finally, we analyzed pairwise combinations of readers and models.Results: Twenty-eight readers, including 20 nonexpert and 5 expert physicians, reviewed 180 cases: 146 subjects with FCD (median age: 25, interquartile range: 18) and 34 healthy control subjects (median age: 43, interquartile range: 19). Nonexpert readers detected 47% (95% confidence interval [CI]: 46, 49) of FCDs, whereas experts detected 68% (95% CI: 65, 71). The 3 AI models detected 32%, 51%, and 72% of FCDs, respectively. The latter, however, also predicted more than 13 false-positive clusters per subject on average. Human performance was improved in the presence of a transmantle sign ( P < 0.001) and cortical thickening ( P < 0.001). In contrast, AI models were sensitive to abnormal gyration ( P < 0.01) or gray-white matter blurring ( P < 0.01). Compared with single experts, expert-expert pairs detected 13% (95% CI: 9, 18) more FCDs ( P < 0.001). All AI models increased expert detection rates by up to 19% (95% CI: 15, 24) ( P < 0.001). Nonexpert+AI pairs could still outperform single experts by up to 13% (95% CI: 10, 17).Conclusions: This study pioneers the comparative evaluation of humans and AI for FCD lesion detection. It shows that AI and human predictions differ, especially for certain MRI features of FCD, and, thus, how AI may complement the diagnostic workup.","PeriodicalId":14486,"journal":{"name":"Investigative Radiology","volume":" ","pages":"253-259"},"PeriodicalIF":8.0000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Quantitative Comparison Between Human and Artificial Intelligence in the Detection of Focal Cortical Dysplasia.\",\"authors\":\"Lennart Walger, Tobias Bauer, David Kügler, Matthias H Schmitz, Fabiane Schuch, Christophe Arendt, Tobias Baumgartner, Johannes Birkenheier, Valeri Borger, Christoph Endler, Franziska Grau, Christian Immanuel, Markus Kölle, Patrick Kupczyk, Asadeh Lakghomi, Sarah Mackert, Elisabeth Neuhaus, Julia Nordsiek, Anna-Maria Odenthal, Karmele Olaciregui Dague, Laura Ostermann, Jan Pukropski, Attila Racz, Klaus von der Ropp, Frederic Carsten Schmeel, Felix Schrader, Aileen Sitter, Alexander Unruh-Pinheiro, Marilia Voigt, Martin Vychopen, Philip von Wedel, Randi von Wrede, Ulrike Attenberger, Hartmut Vatter, Alexandra Philipsen, Albert Becker, Martin Reuter, Elke Hattingen, Josemir W Sander, Alexander Radbruch, Rainer Surges, Theodor Rüber\",\"doi\":\"10.1097/RLI.0000000000001125\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Objectives: Artificial intelligence (AI) is thought to improve lesion detection. However, a lack of knowledge about human performance prevents a comparative evaluation of AI and an accurate assessment of its impact on clinical decision-making. The objective of this work is to quantitatively evaluate the ability of humans to detect focal cortical dysplasia (FCD), compare it to state-of-the-art AI, and determine how it may aid diagnostics.Materials and methods: We prospectively recorded the performance of readers in detecting FCDs using single points and 3-dimensional bounding boxes. We acquired predictions of 3 AI models for the same dataset and compared these to readers. Finally, we analyzed pairwise combinations of readers and models.Results: Twenty-eight readers, including 20 nonexpert and 5 expert physicians, reviewed 180 cases: 146 subjects with FCD (median age: 25, interquartile range: 18) and 34 healthy control subjects (median age: 43, interquartile range: 19). Nonexpert readers detected 47% (95% confidence interval [CI]: 46, 49) of FCDs, whereas experts detected 68% (95% CI: 65, 71). The 3 AI models detected 32%, 51%, and 72% of FCDs, respectively. The latter, however, also predicted more than 13 false-positive clusters per subject on average. Human performance was improved in the presence of a transmantle sign ( P < 0.001) and cortical thickening ( P < 0.001). In contrast, AI models were sensitive to abnormal gyration ( P < 0.01) or gray-white matter blurring ( P < 0.01). Compared with single experts, expert-expert pairs detected 13% (95% CI: 9, 18) more FCDs ( P < 0.001). All AI models increased expert detection rates by up to 19% (95% CI: 15, 24) ( P < 0.001). Nonexpert+AI pairs could still outperform single experts by up to 13% (95% CI: 10, 17).Conclusions: This study pioneers the comparative evaluation of humans and AI for FCD lesion detection. It shows that AI and human predictions differ, especially for certain MRI features of FCD, and, thus, how AI may complement the diagnostic workup.\",\"PeriodicalId\":14486,\"journal\":{\"name\":\"Investigative Radiology\",\"volume\":\" \",\"pages\":\"253-259\"},\"PeriodicalIF\":8.0000,\"publicationDate\":\"2025-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Investigative Radiology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1097/RLI.0000000000001125\",\"RegionNum\":1,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/10/23 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Investigative Radiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1097/RLI.0000000000001125","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/10/23 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}

引用次数: 0

摘要

目的：人工智能（AI）被认为可以改善病变检测。然而，由于缺乏对人类表现的了解，因此无法对人工智能进行比较评估，也无法准确评估其对临床决策的影响。这项工作的目的是定量评估人类检测局灶性皮质发育不良（FCD）的能力，将其与最先进的人工智能进行比较，并确定人工智能如何帮助诊断：我们前瞻性地记录了读者使用单点和三维边界框检测 FCD 的表现。我们获得了 3 个人工智能模型对同一数据集的预测结果，并将其与阅读器进行了比较。最后，我们对阅读器和模型的配对组合进行了分析：28 位读者（包括 20 位非专家医生和 5 位专家医生）审查了 180 个病例：146 名 FCD 受试者（中位年龄：25 岁，四分位数间距：18）和 34 名健康对照受试者（中位年龄：43 岁，四分位数间距：19）。非专业读者发现了 47%（95% 置信区间 [CI]：46，49）的 FCD，而专家发现了 68%（95% 置信区间：65，71）的 FCD。三个人工智能模型分别检测出 32%、51% 和 72% 的 FCD。不过，后者也平均预测出了每个受试者 13 个以上的假阳性群集。在出现横纹征（P < 0.001）和皮质增厚（P < 0.001）的情况下，人类的表现有所改善。相反，人工智能模型对异常回旋（P < 0.01）或灰白色物质模糊（P < 0.01）很敏感。与单个专家相比，专家-专家配对检测出的 FCD 高出 13% (95% CI: 9, 18) (P < 0.001)。所有人工智能模型都将专家检测率提高了 19% (95% CI: 15, 24) (P < 0.001)。非专家+人工智能配对仍比单一专家高出 13% (95% CI: 10, 17)：这项研究开创了人类与人工智能在 FCD 病变检测方面进行比较评估的先河。它显示了人工智能和人类预测的差异，尤其是对 FCD 某些 MRI 特征的预测，从而显示了人工智能可如何辅助诊断工作。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Quantitative Comparison Between Human and Artificial Intelligence in the Detection of Focal Cortical Dysplasia.

Objectives: Artificial intelligence (AI) is thought to improve lesion detection. However, a lack of knowledge about human performance prevents a comparative evaluation of AI and an accurate assessment of its impact on clinical decision-making. The objective of this work is to quantitatively evaluate the ability of humans to detect focal cortical dysplasia (FCD), compare it to state-of-the-art AI, and determine how it may aid diagnostics.

Materials and methods: We prospectively recorded the performance of readers in detecting FCDs using single points and 3-dimensional bounding boxes. We acquired predictions of 3 AI models for the same dataset and compared these to readers. Finally, we analyzed pairwise combinations of readers and models.

Results: Twenty-eight readers, including 20 nonexpert and 5 expert physicians, reviewed 180 cases: 146 subjects with FCD (median age: 25, interquartile range: 18) and 34 healthy control subjects (median age: 43, interquartile range: 19). Nonexpert readers detected 47% (95% confidence interval [CI]: 46, 49) of FCDs, whereas experts detected 68% (95% CI: 65, 71). The 3 AI models detected 32%, 51%, and 72% of FCDs, respectively. The latter, however, also predicted more than 13 false-positive clusters per subject on average. Human performance was improved in the presence of a transmantle sign ( P < 0.001) and cortical thickening ( P < 0.001). In contrast, AI models were sensitive to abnormal gyration ( P < 0.01) or gray-white matter blurring ( P < 0.01). Compared with single experts, expert-expert pairs detected 13% (95% CI: 9, 18) more FCDs ( P < 0.001). All AI models increased expert detection rates by up to 19% (95% CI: 15, 24) ( P < 0.001). Nonexpert+AI pairs could still outperform single experts by up to 13% (95% CI: 10, 17).

Conclusions: This study pioneers the comparative evaluation of humans and AI for FCD lesion detection. It shows that AI and human predictions differ, especially for certain MRI features of FCD, and, thus, how AI may complement the diagnostic workup.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Investigative Radiology 医学-核医学

CiteScore

15.10

自引率

16.40%

发文量

188

审稿时长

4-8 weeks

期刊介绍： Investigative Radiology publishes original, peer-reviewed reports on clinical and laboratory investigations in diagnostic imaging, the diagnostic use of radioactive isotopes, computed tomography, positron emission tomography, magnetic resonance imaging, ultrasound, digital subtraction angiography, and related modalities. Emphasis is on early and timely publication. Primarily research-oriented, the journal also includes a wide variety of features of interest to clinical radiologists.