From data to decision: Scaling artificial intelligence with informatics for epilepsy management

IF 7.9 1区 医学 Q1 MEDICINE, RESEARCH & EXPERIMENTAL
Nishant Sinha, Alfredo Lucas, Kathryn Adamiak Davis
{"title":"From data to decision: Scaling artificial intelligence with informatics for epilepsy management","authors":"Nishant Sinha,&nbsp;Alfredo Lucas,&nbsp;Kathryn Adamiak Davis","doi":"10.1002/ctm2.70108","DOIUrl":null,"url":null,"abstract":"<p>The integration of artificial intelligence (AI) into epilepsy research presents a critical opportunity to revolutionize the management of this complex neurological disorder.<span><sup>1</sup></span> Despite significant advancements in developing AI algorithms to diagnose and manage epilepsy, their translation into clinical practice remains limited. This gap underscores the urgent need for scalable AI and neuroinformatics approaches that can bridge the divide between research and real-world application.<span><sup>2</sup></span> The ability to generalize AI models from controlled research environments to diverse clinical settings is crucial. Current efforts have made substantial progress, but they also reveal common pitfalls, such as overestimation of model performance due to data leakage and the challenges of small sample sizes, which hinder the generalization of these models.</p><p>To address these challenges and fully realize the potential of AI in epilepsy care, a robust framework for data sharing and collaboration across research centres is essential. Cloud-based informatics platforms offer a promising solution by enabling the aggregation and harmonization of large, multisite datasets. These platforms can facilitate the development of AI models that are not only powerful but also scalable and generalizable across different patient populations and clinical scenarios. In this commentary, we will explore the common methodological errors that lead to overly optimistic AI models in epilepsy research and propose strategies to overcome these issues. We will also discuss the importance of collaborative data sharing in building robust, clinically relevant AI tools and highlight the role of advanced neuroinformatics infrastructures in supporting the translational pathway from research to clinical practice (Figure 1).</p><p>The promise of AI in epilepsy research is often hampered by methodological errors that lead to overly optimistic performance metrics. One of the most significant issues is <i>data leakage</i>, which occurs when information from outside the training dataset influences the model, resulting in an overestimation of its predictive power. This can happen when features are derived from the entire dataset rather than just the training subset.<span><sup>3</sup></span> To mitigate this, strict separation between training and test datasets is essential and feature selection must be performed within each fold of the cross-validation process independently. Nested cross-validation, where model selection and performance estimation are conducted separately, further reduces the risk of data leakage.</p><p>Another common error is the <i>improper application of cross-validation</i> techniques. Often, researchers perform feature selection or hyperparameter tuning on the entire dataset before cross-validation, leading to inflated performance metrics. The correct approach is to embed these steps within each fold of the cross-validation process to ensure that the test data remain completely unseen until the final evaluation. This practice helps prevent overfitting and provides a more accurate estimate of how the model will perform on new data.</p><p><i>Small sample size</i> presents a third challenge, particularly in epilepsy research, where datasets are often of modest size and heterogeneous. Small datasets can lead to overfitting, where the model learns patterns specific to the training data but fails to generalize to new data. Addressing this requires both methodological rigour and collaborative efforts to pool data across multiple sites, thereby creating larger, more diverse datasets. Data augmentation techniques, such as generating synthetic data, can also help increase the effective size of the training set.</p><p>The development of robust AI models in epilepsy is further strengthened by collaborative data sharing, which allows researchers to pool datasets from multiple sources, increasing both the size and diversity of the data available for training. Epilepsy is a highly heterogeneous disorder, and individual research centres often have access to only small modest-size cohorts. By aggregating data across different sites, researchers can develop AI tools that are more representative of the broad clinical reality to improve generalizability and reliability across diverse clinical settings.</p><p>Collaborative data sharing also enables the replication of studies, which is critical for validating AI models across different cohorts to ensure that the models are both accurate and reproducible. Such collaboration fosters the sharing of expertise and resources, allowing researchers to tackle complex challenges, such as integrating multimodal data—neuroimaging, electrophysiology and clinical records—into more sophisticated AI models.</p><p>To support effective data sharing and utilization across multiple sites, advanced neuroinformatics infrastructures are indispensable. Platforms like EBRAINS, Pennsieve (https://app.pennsieve.io/) and OpenNeuro, among others, provide the technological foundation needed to securely aggregate, manage and analyze large-scale epilepsy datasets.<span><sup>4, 5</sup></span> These platforms enable researchers to apply standardized methods and tools across different datasets to ensure the rigour, robustness and reproducibility of AI models.</p><p>Neuroinformatics platforms also adhere to the principles of making data findable, accessible, interoperable and reusable, which is crucial for effective data sharing.<span><sup>6</sup></span> By facilitating data harmonization and integration, these platforms ensure that data from multiple sources can be combined and analyzed consistently.<span><sup>7</sup></span> Furthermore, neuroinformatics infrastructures support collaborative analysis by allowing researchers to share not just data, but also the algorithms and models developed from that data. For example, researchers could share their electrode localization outputs generated from a standardized pipeline,<span><sup>8</sup></span> together with their intracranial electroencephalography recordings, and the deep learning model trained for seizure detection. Alternatively, researchers might only share their data,<span><sup>9</sup></span> and the preprocessing and model building could all happen within these infrastructures.<span><sup>10</sup></span> This fosters an open science environment where AI models can be tested and refined across different datasets to accelerate the development of clinically applicable tools.</p><p>In summary, the advancement of AI in epilepsy research depends on both methodological rigour and collaborative efforts. By addressing common errors in AI model development and leveraging the power of collaborative data sharing, we can build robust, clinically relevant tools. Neuroinformatics infrastructures provide the necessary support for these endeavours to ensure that AI models are not only powerful but also applicable in real-world clinical settings. These combined strategies are essential to translate AI research into tangible improvements in epilepsy care, ultimately leading to better patient outcomes.</p><p><i>Conceptualization</i>: Nishant Sinha, Alfredo Lucas, Kathryn Adamiak Davis. <i>Writing—original draft preparation and revision for intellectual content</i>: Nishant Sinha, Alfredo Lucas, Kathryn Adamiak Davis.</p><p>The authors declare no conflict of interest.</p>","PeriodicalId":10189,"journal":{"name":"Clinical and Translational Medicine","volume":"14 12","pages":""},"PeriodicalIF":7.9000,"publicationDate":"2024-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11645443/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical and Translational Medicine","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/ctm2.70108","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICINE, RESEARCH & EXPERIMENTAL","Score":null,"Total":0}
引用次数: 0

Abstract

The integration of artificial intelligence (AI) into epilepsy research presents a critical opportunity to revolutionize the management of this complex neurological disorder.1 Despite significant advancements in developing AI algorithms to diagnose and manage epilepsy, their translation into clinical practice remains limited. This gap underscores the urgent need for scalable AI and neuroinformatics approaches that can bridge the divide between research and real-world application.2 The ability to generalize AI models from controlled research environments to diverse clinical settings is crucial. Current efforts have made substantial progress, but they also reveal common pitfalls, such as overestimation of model performance due to data leakage and the challenges of small sample sizes, which hinder the generalization of these models.

To address these challenges and fully realize the potential of AI in epilepsy care, a robust framework for data sharing and collaboration across research centres is essential. Cloud-based informatics platforms offer a promising solution by enabling the aggregation and harmonization of large, multisite datasets. These platforms can facilitate the development of AI models that are not only powerful but also scalable and generalizable across different patient populations and clinical scenarios. In this commentary, we will explore the common methodological errors that lead to overly optimistic AI models in epilepsy research and propose strategies to overcome these issues. We will also discuss the importance of collaborative data sharing in building robust, clinically relevant AI tools and highlight the role of advanced neuroinformatics infrastructures in supporting the translational pathway from research to clinical practice (Figure 1).

The promise of AI in epilepsy research is often hampered by methodological errors that lead to overly optimistic performance metrics. One of the most significant issues is data leakage, which occurs when information from outside the training dataset influences the model, resulting in an overestimation of its predictive power. This can happen when features are derived from the entire dataset rather than just the training subset.3 To mitigate this, strict separation between training and test datasets is essential and feature selection must be performed within each fold of the cross-validation process independently. Nested cross-validation, where model selection and performance estimation are conducted separately, further reduces the risk of data leakage.

Another common error is the improper application of cross-validation techniques. Often, researchers perform feature selection or hyperparameter tuning on the entire dataset before cross-validation, leading to inflated performance metrics. The correct approach is to embed these steps within each fold of the cross-validation process to ensure that the test data remain completely unseen until the final evaluation. This practice helps prevent overfitting and provides a more accurate estimate of how the model will perform on new data.

Small sample size presents a third challenge, particularly in epilepsy research, where datasets are often of modest size and heterogeneous. Small datasets can lead to overfitting, where the model learns patterns specific to the training data but fails to generalize to new data. Addressing this requires both methodological rigour and collaborative efforts to pool data across multiple sites, thereby creating larger, more diverse datasets. Data augmentation techniques, such as generating synthetic data, can also help increase the effective size of the training set.

The development of robust AI models in epilepsy is further strengthened by collaborative data sharing, which allows researchers to pool datasets from multiple sources, increasing both the size and diversity of the data available for training. Epilepsy is a highly heterogeneous disorder, and individual research centres often have access to only small modest-size cohorts. By aggregating data across different sites, researchers can develop AI tools that are more representative of the broad clinical reality to improve generalizability and reliability across diverse clinical settings.

Collaborative data sharing also enables the replication of studies, which is critical for validating AI models across different cohorts to ensure that the models are both accurate and reproducible. Such collaboration fosters the sharing of expertise and resources, allowing researchers to tackle complex challenges, such as integrating multimodal data—neuroimaging, electrophysiology and clinical records—into more sophisticated AI models.

To support effective data sharing and utilization across multiple sites, advanced neuroinformatics infrastructures are indispensable. Platforms like EBRAINS, Pennsieve (https://app.pennsieve.io/) and OpenNeuro, among others, provide the technological foundation needed to securely aggregate, manage and analyze large-scale epilepsy datasets.4, 5 These platforms enable researchers to apply standardized methods and tools across different datasets to ensure the rigour, robustness and reproducibility of AI models.

Neuroinformatics platforms also adhere to the principles of making data findable, accessible, interoperable and reusable, which is crucial for effective data sharing.6 By facilitating data harmonization and integration, these platforms ensure that data from multiple sources can be combined and analyzed consistently.7 Furthermore, neuroinformatics infrastructures support collaborative analysis by allowing researchers to share not just data, but also the algorithms and models developed from that data. For example, researchers could share their electrode localization outputs generated from a standardized pipeline,8 together with their intracranial electroencephalography recordings, and the deep learning model trained for seizure detection. Alternatively, researchers might only share their data,9 and the preprocessing and model building could all happen within these infrastructures.10 This fosters an open science environment where AI models can be tested and refined across different datasets to accelerate the development of clinically applicable tools.

In summary, the advancement of AI in epilepsy research depends on both methodological rigour and collaborative efforts. By addressing common errors in AI model development and leveraging the power of collaborative data sharing, we can build robust, clinically relevant tools. Neuroinformatics infrastructures provide the necessary support for these endeavours to ensure that AI models are not only powerful but also applicable in real-world clinical settings. These combined strategies are essential to translate AI research into tangible improvements in epilepsy care, ultimately leading to better patient outcomes.

Conceptualization: Nishant Sinha, Alfredo Lucas, Kathryn Adamiak Davis. Writing—original draft preparation and revision for intellectual content: Nishant Sinha, Alfredo Lucas, Kathryn Adamiak Davis.

The authors declare no conflict of interest.

Abstract Image

求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
15.90
自引率
1.90%
发文量
450
审稿时长
4 weeks
期刊介绍: Clinical and Translational Medicine (CTM) is an international, peer-reviewed, open-access journal dedicated to accelerating the translation of preclinical research into clinical applications and fostering communication between basic and clinical scientists. It highlights the clinical potential and application of various fields including biotechnologies, biomaterials, bioengineering, biomarkers, molecular medicine, omics science, bioinformatics, immunology, molecular imaging, drug discovery, regulation, and health policy. With a focus on the bench-to-bedside approach, CTM prioritizes studies and clinical observations that generate hypotheses relevant to patients and diseases, guiding investigations in cellular and molecular medicine. The journal encourages submissions from clinicians, researchers, policymakers, and industry professionals.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信