A public benchmark for human performance in the detection of focal cortical dysplasia

IF 2.8 3区医学 Q2 CLINICAL NEUROLOGY

Epilepsia Open Pub Date : 2025-04-01 DOI:10.1002/epi4.70028

Lennart Walger, Matthias H. Schmitz, Tobias Bauer, David Kügler, Fabiane Schuch, Christophe Arendt, Tobias Baumgartner, Johannes Birkenheier, Valeri Borger, Christoph Endler, Franziska Grau, Christian Immanuel, Markus Kölle, Patrick Kupczyk, Asadeh Lakghomi, Sarah Mackert, Elisabeth Neuhaus, Julia Nordsiek, Anna-Maria Odenthal, Karmele Olaciregui Dague, Laura Ostermann, Jan Pukropski, Attila Racz, Klaus von der Ropp, Frederic Carsten Schmeel, Felix Schrader, Aileen Sitter, Alexander Unruh-Pinheiro, Marilia Voigt, Martin Vychopen, Philip von Wedel, Randi von Wrede, Ulrike Attenberger, Hartmut Vatter, Alexandra Philipsen, Albert Becker, Martin Reuter, Elke Hattingen, Alexander Radbruch, Rainer Surges, Theodor Rüber

{"title":"A public benchmark for human performance in the detection of focal cortical dysplasia","authors":"Lennart Walger, Matthias H. Schmitz, Tobias Bauer, David Kügler, Fabiane Schuch, Christophe Arendt, Tobias Baumgartner, Johannes Birkenheier, Valeri Borger, Christoph Endler, Franziska Grau, Christian Immanuel, Markus Kölle, Patrick Kupczyk, Asadeh Lakghomi, Sarah Mackert, Elisabeth Neuhaus, Julia Nordsiek, Anna-Maria Odenthal, Karmele Olaciregui Dague, Laura Ostermann, Jan Pukropski, Attila Racz, Klaus von der Ropp, Frederic Carsten Schmeel, Felix Schrader, Aileen Sitter, Alexander Unruh-Pinheiro, Marilia Voigt, Martin Vychopen, Philip von Wedel, Randi von Wrede, Ulrike Attenberger, Hartmut Vatter, Alexandra Philipsen, Albert Becker, Martin Reuter, Elke Hattingen, Alexander Radbruch, Rainer Surges, Theodor Rüber","doi":"10.1002/epi4.70028","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Objective</h3>\n \n <p>This study aims to report human performance in the detection of Focal Cortical Dysplasias (FCDs) using an openly available dataset. Additionally, it defines a subset of this data as a “difficult” test set to establish a public baseline benchmark against which new methods for automated FCD detection can be evaluated.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>The performance of 28 human readers with varying levels of expertise in detecting FCDs was originally analyzed using 146 subjects (not all of which are openly available), we analyzed the openly available subset of 85 cases. Performance was measured based on the overlap between predicted regions of interest (ROIs) and ground-truth lesion masks, using the Dice-Soerensen coefficient (DSC). The benchmark test set was chosen to consist of 15 subjects most predictive for human performance and 13 subjects identified by at most 3 of the 28 readers.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>Expert readers achieved an average detection rate of 68%, compared to 45% for non-experts and 27% for laypersons. Neuroradiologists detected the highest percentage of lesions (64%), while psychiatrists detected the least (34%). Neurosurgeons had the highest ROI sensitivity (0.70), and psychiatrists had the highest ROI precision (0.78). The benchmark test set revealed an expert detection rate of 49%.</p>\n </section>\n \n <section>\n \n <h3> Significance</h3>\n \n <p>Reporting human performance in FCD detection provides a critical baseline for assessing the effectiveness of automated detection methods in a clinically relevant context. The defined benchmark test set serves as a useful indicator for evaluating advancements in computer-aided FCD detection approaches.</p>\n </section>\n \n <section>\n \n <h3> Plain Language Summary</h3>\n \n <p>Focal cortical dysplasias (FCDs) are malformations of cortical development and one of the most common causes of drug-resistant focal epilepsy. Once found, FCDs can be neurosurgically resected, which leads to seizure freedom in many cases. However, FCDs are difficult to detect in the visual assessment of magnetic resonance imaging. A myriad of algorithms for automated FCD detection have been developed, but their true clinical value remains unclear since there is no benchmark dataset for evaluation and comparison to human performance. Here, we use human FCD detection performance to define a benchmark dataset with which new methods for automated detection can be evaluated.</p>\n </section>\n </div>","PeriodicalId":12038,"journal":{"name":"Epilepsia Open","volume":"10 3","pages":"778-786"},"PeriodicalIF":2.8000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/epi4.70028","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Epilepsia Open","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/epi4.70028","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Objective

This study aims to report human performance in the detection of Focal Cortical Dysplasias (FCDs) using an openly available dataset. Additionally, it defines a subset of this data as a “difficult” test set to establish a public baseline benchmark against which new methods for automated FCD detection can be evaluated.

Methods

The performance of 28 human readers with varying levels of expertise in detecting FCDs was originally analyzed using 146 subjects (not all of which are openly available), we analyzed the openly available subset of 85 cases. Performance was measured based on the overlap between predicted regions of interest (ROIs) and ground-truth lesion masks, using the Dice-Soerensen coefficient (DSC). The benchmark test set was chosen to consist of 15 subjects most predictive for human performance and 13 subjects identified by at most 3 of the 28 readers.

Results

Expert readers achieved an average detection rate of 68%, compared to 45% for non-experts and 27% for laypersons. Neuroradiologists detected the highest percentage of lesions (64%), while psychiatrists detected the least (34%). Neurosurgeons had the highest ROI sensitivity (0.70), and psychiatrists had the highest ROI precision (0.78). The benchmark test set revealed an expert detection rate of 49%.

Significance

Reporting human performance in FCD detection provides a critical baseline for assessing the effectiveness of automated detection methods in a clinically relevant context. The defined benchmark test set serves as a useful indicator for evaluating advancements in computer-aided FCD detection approaches.

Plain Language Summary

Focal cortical dysplasias (FCDs) are malformations of cortical development and one of the most common causes of drug-resistant focal epilepsy. Once found, FCDs can be neurosurgically resected, which leads to seizure freedom in many cases. However, FCDs are difficult to detect in the visual assessment of magnetic resonance imaging. A myriad of algorithms for automated FCD detection have been developed, but their true clinical value remains unclear since there is no benchmark dataset for evaluation and comparison to human performance. Here, we use human FCD detection performance to define a benchmark dataset with which new methods for automated detection can be evaluated.

查看原文本刊更多论文

在局灶性皮质发育不良的检测中，人类表现的公共基准。

目的：本研究旨在使用公开可用的数据集报告人类在局灶性皮质发育不良（FCDs）检测中的表现。此外，它将这些数据的一个子集定义为一个“困难”的测试集，以建立一个公共基线基准，以评估自动化FCD检测的新方法。方法：我们最初分析了28位具有不同专业水平的人类读者在检测fcd方面的表现，使用146个受试者（并非所有受试者都是公开的），我们分析了85个公开可用的病例子集。使用Dice-Soerensen系数（DSC），基于预测感兴趣区域（roi）和真值病变掩模之间的重叠来衡量性能。基准测试集由15个最能预测人类表现的主题和13个被28个读者中最多3个确定的主题组成。结果：专家阅读者的平均检出率为68%，非专家阅读者为45%，非专业阅读者为27%。神经放射科医生发现的病变比例最高（64%），而精神科医生发现的病变比例最低（34%）。神经外科医生的ROI灵敏度最高（0.70），精神科医生的ROI精度最高（0.78）。基准测试集显示专家检出率为49%。意义：报告人类在FCD检测中的表现为在临床相关背景下评估自动检测方法的有效性提供了关键的基线。定义的基准测试集可以作为评估计算机辅助FCD检测方法进展的有用指标。摘要：局灶性皮质发育不良（FCDs）是一种皮质发育畸形，是引起耐药局灶性癫痫的最常见原因之一。一旦发现，fcd可以通过神经外科手术切除，这在许多情况下会导致癫痫发作。然而，在磁共振成像的视觉评估中，FCDs很难被检测到。目前已经开发了无数用于FCD自动检测的算法，但由于没有用于评估和比较人类表现的基准数据集，因此它们的真正临床价值尚不清楚。在这里，我们使用人类FCD检测性能来定义一个基准数据集，用它可以评估自动检测的新方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊