{"title":"MLSPred-bench:将脑电图(EEG)数据集转换为机器学习就绪的癫痫发作预测基准","authors":"Umair Mohammad , Fahad Saeed","doi":"10.1016/j.mex.2025.103574","DOIUrl":null,"url":null,"abstract":"<div><div>Predicting epileptic seizures is a significantly more challenging task compared to seizure detection. However, most publicly available electroencephalography (EEG) datasets are geared towards detection because the ictal phase (main symptomatic period) is annotated. In contrast, prediction requires the availability of annotated preictal and interictal phases. To this end, we designed and developed a method called <strong><em>MLSPred-Bench</em></strong> that can be used for converting any EEG big data annotated for detection into ML-ready data suitable for prediction. We apply our methods to the existing EEG data corpus to generate 12 ML-ready benchmarks comprising data for training, validating, and testing seizure prediction models. Our strategy uses different variations of seizure prediction horizon (SPH) and the seizure occurrence period (SOP) to produce >150GB of ML-ready data. To illustrate the usefulness of the generated data, we technically validate all the benchmarks using multiple machine learning (ML) and deep learning (DL) models. We hope that the generated benchmarking data will be utilized by various computational groups for their seizure prediction model development.</div><div>The work can be summarized as follows:<ul><li><span>1.</span><span><div>Extract short preictal and interictal segments from long-duration annotated EEG montages.</div></span></li><li><span>2.</span><span><div>Generate a comprehensive list of ML-ready benchmarks with varying SPH and SOP.</div></span></li><li><span>3.</span><span><div>Technically validate the generated data with multiple ML and DL models with up-to 88.73 % validation accuracy</div></span></li><li><span>4.</span><span><div>Opensource code and related materials are available at <span><span>https://github.com/pcdslab/MLSPred-Bench</span><svg><path></path></svg></span>.</div></span></li></ul></div></div>","PeriodicalId":18446,"journal":{"name":"MethodsX","volume":"15 ","pages":"Article 103574"},"PeriodicalIF":1.9000,"publicationDate":"2025-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MLSPred-bench: Transforming electroencephalography (EEG) datasets into machine learning-ready epileptic seizure prediction benchmarks\",\"authors\":\"Umair Mohammad , Fahad Saeed\",\"doi\":\"10.1016/j.mex.2025.103574\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Predicting epileptic seizures is a significantly more challenging task compared to seizure detection. However, most publicly available electroencephalography (EEG) datasets are geared towards detection because the ictal phase (main symptomatic period) is annotated. In contrast, prediction requires the availability of annotated preictal and interictal phases. To this end, we designed and developed a method called <strong><em>MLSPred-Bench</em></strong> that can be used for converting any EEG big data annotated for detection into ML-ready data suitable for prediction. We apply our methods to the existing EEG data corpus to generate 12 ML-ready benchmarks comprising data for training, validating, and testing seizure prediction models. Our strategy uses different variations of seizure prediction horizon (SPH) and the seizure occurrence period (SOP) to produce >150GB of ML-ready data. To illustrate the usefulness of the generated data, we technically validate all the benchmarks using multiple machine learning (ML) and deep learning (DL) models. We hope that the generated benchmarking data will be utilized by various computational groups for their seizure prediction model development.</div><div>The work can be summarized as follows:<ul><li><span>1.</span><span><div>Extract short preictal and interictal segments from long-duration annotated EEG montages.</div></span></li><li><span>2.</span><span><div>Generate a comprehensive list of ML-ready benchmarks with varying SPH and SOP.</div></span></li><li><span>3.</span><span><div>Technically validate the generated data with multiple ML and DL models with up-to 88.73 % validation accuracy</div></span></li><li><span>4.</span><span><div>Opensource code and related materials are available at <span><span>https://github.com/pcdslab/MLSPred-Bench</span><svg><path></path></svg></span>.</div></span></li></ul></div></div>\",\"PeriodicalId\":18446,\"journal\":{\"name\":\"MethodsX\",\"volume\":\"15 \",\"pages\":\"Article 103574\"},\"PeriodicalIF\":1.9000,\"publicationDate\":\"2025-08-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"MethodsX\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2215016125004182\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"MethodsX","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2215016125004182","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
Predicting epileptic seizures is a significantly more challenging task compared to seizure detection. However, most publicly available electroencephalography (EEG) datasets are geared towards detection because the ictal phase (main symptomatic period) is annotated. In contrast, prediction requires the availability of annotated preictal and interictal phases. To this end, we designed and developed a method called MLSPred-Bench that can be used for converting any EEG big data annotated for detection into ML-ready data suitable for prediction. We apply our methods to the existing EEG data corpus to generate 12 ML-ready benchmarks comprising data for training, validating, and testing seizure prediction models. Our strategy uses different variations of seizure prediction horizon (SPH) and the seizure occurrence period (SOP) to produce >150GB of ML-ready data. To illustrate the usefulness of the generated data, we technically validate all the benchmarks using multiple machine learning (ML) and deep learning (DL) models. We hope that the generated benchmarking data will be utilized by various computational groups for their seizure prediction model development.
The work can be summarized as follows:
1.
Extract short preictal and interictal segments from long-duration annotated EEG montages.
2.
Generate a comprehensive list of ML-ready benchmarks with varying SPH and SOP.
3.
Technically validate the generated data with multiple ML and DL models with up-to 88.73 % validation accuracy
4.
Opensource code and related materials are available at https://github.com/pcdslab/MLSPred-Bench.