EpiSemoLLM: A Fine-tuned Large Language Model for Epileptogenic Zone Localization Based on Seizure Semiology with a Performance Comparable to Epileptologists

Shihao Yang, Yaxi Luo, Meng Jiao, Neel Fotedar, Vikram R. Rao, Xinglong Ju, Shasha Wu, Xiaochen Xian, Hai Sun, Ioannis Karakis, Danilo Bernardo, Josh Laing, Patrick Kwan, Felix Rosenow, Feng Liu
{"title":"EpiSemoLLM: A Fine-tuned Large Language Model for Epileptogenic Zone Localization Based on Seizure Semiology with a Performance Comparable to Epileptologists","authors":"Shihao Yang, Yaxi Luo, Meng Jiao, Neel Fotedar, Vikram R. Rao, Xinglong Ju, Shasha Wu, Xiaochen Xian, Hai Sun, Ioannis Karakis, Danilo Bernardo, Josh Laing, Patrick Kwan, Felix Rosenow, Feng Liu","doi":"10.1101/2024.09.16.24313764","DOIUrl":null,"url":null,"abstract":"Significance: Seizure semiology, the study of signs and clinical manifestations during seizure episodes, provides crucial information for inferring the location of epileptogenic zone (EZ). Given the descriptive nature of seizure semiology and recent advancements in large language models (LLMs), there is a potential to improve the localization accuracy of EZ by leveraging LLMs for interpreting the seizure semiology and mapping its descriptions to the corresponding EZs. This study introduces the Epilepsy Semiology Large Language Model, or EpiSemoLLM, the first fine-tuned LLM designed specifically for this purpose, built upon the Mistral-7B foundational model.\nMethod: A total of 865 cases, each containing seizure semiology descriptions paired with validated EZs via intracranial EEG recording and postoperative surgery outcome, were collected from 189 publications. These collected data cohort of seizure semiology descriptions and EZs, as the high-quality domain specific data, is used to fine-tune the foundational LLM to improve its ability to predict the most likely EZs. To evaluate the performance of the fine-tuned EpiSemoLLM, 100 well-defined cases were tested by comparing the responses from EpiSemoLLM with those from a panel of 5 epileptologists. The responses were graded using the rectified reliability score (rRS) and regional accuracy rate (RAR). Additionally, the performance of EpiSemoLLM was compared with its foundational model, Mistral-7B, and various versions of ChatGPT, Llama as other representative LLMs.\nResult: In the comparison with a panel of epileptologists, EpiSemoLLM achieved the following score for regional accuracy rates (RAR) with zero-shot prompts: 60.71% for the frontal lobe, 83.33% for the temporal lobe, 63.16% for the occipital lobe, 45.83% for the parietal lobe, 33.33% for the insular cortex, and 28.57% for the cingulate cortex; and mean rectified reliability score (rRS) 0.291. In comparison, the epileptologists' averaged RAR scores were 64.83% for the frontal lobe, 52.22% for the temporal lobe, 60.00% for the occipital lobe, 42.50% for the parietal lobe, 46.00% for the insular cortex, and 8.57% for the cingulate cortex; and rectified reliability score (rRS) with mean of 0.148. Notably, the fine-tuned EpiSemoLLM outperformed its foundational LLM, Mistral-7B-instruct, and various versions of ChatGPT and Llama, particularly in localizing EZs in the insular and cingulate cortex. EpiSemoLLM offers valuable information for presurgical evaluations by identifying the most likely EZ location based on seizure semiology.\nConclusion: EpiSemoLLM demonstrates comparable performance to epileptologists in inferring EZs from patients' seizure semiology, highlighting its value in epilepsy presurgical assessment. EpiSemoLLM outperformed epileptologists in interpreting seizure semiology with EZs originating from the temporal and parietal lobes, as well as the insular cortex. Conversely, epileptologists outperformed EpiSemoLLM regarding EZ localizations in the frontal and occipital lobes and the cingulate cortex. The models' superior performance compared to the foundational model underscores the effectiveness of fine-tuning LLMs with high-quality, domain-specific samples.","PeriodicalId":501367,"journal":{"name":"medRxiv - Neurology","volume":"1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"medRxiv - Neurology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.09.16.24313764","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Significance: Seizure semiology, the study of signs and clinical manifestations during seizure episodes, provides crucial information for inferring the location of epileptogenic zone (EZ). Given the descriptive nature of seizure semiology and recent advancements in large language models (LLMs), there is a potential to improve the localization accuracy of EZ by leveraging LLMs for interpreting the seizure semiology and mapping its descriptions to the corresponding EZs. This study introduces the Epilepsy Semiology Large Language Model, or EpiSemoLLM, the first fine-tuned LLM designed specifically for this purpose, built upon the Mistral-7B foundational model. Method: A total of 865 cases, each containing seizure semiology descriptions paired with validated EZs via intracranial EEG recording and postoperative surgery outcome, were collected from 189 publications. These collected data cohort of seizure semiology descriptions and EZs, as the high-quality domain specific data, is used to fine-tune the foundational LLM to improve its ability to predict the most likely EZs. To evaluate the performance of the fine-tuned EpiSemoLLM, 100 well-defined cases were tested by comparing the responses from EpiSemoLLM with those from a panel of 5 epileptologists. The responses were graded using the rectified reliability score (rRS) and regional accuracy rate (RAR). Additionally, the performance of EpiSemoLLM was compared with its foundational model, Mistral-7B, and various versions of ChatGPT, Llama as other representative LLMs. Result: In the comparison with a panel of epileptologists, EpiSemoLLM achieved the following score for regional accuracy rates (RAR) with zero-shot prompts: 60.71% for the frontal lobe, 83.33% for the temporal lobe, 63.16% for the occipital lobe, 45.83% for the parietal lobe, 33.33% for the insular cortex, and 28.57% for the cingulate cortex; and mean rectified reliability score (rRS) 0.291. In comparison, the epileptologists' averaged RAR scores were 64.83% for the frontal lobe, 52.22% for the temporal lobe, 60.00% for the occipital lobe, 42.50% for the parietal lobe, 46.00% for the insular cortex, and 8.57% for the cingulate cortex; and rectified reliability score (rRS) with mean of 0.148. Notably, the fine-tuned EpiSemoLLM outperformed its foundational LLM, Mistral-7B-instruct, and various versions of ChatGPT and Llama, particularly in localizing EZs in the insular and cingulate cortex. EpiSemoLLM offers valuable information for presurgical evaluations by identifying the most likely EZ location based on seizure semiology. Conclusion: EpiSemoLLM demonstrates comparable performance to epileptologists in inferring EZs from patients' seizure semiology, highlighting its value in epilepsy presurgical assessment. EpiSemoLLM outperformed epileptologists in interpreting seizure semiology with EZs originating from the temporal and parietal lobes, as well as the insular cortex. Conversely, epileptologists outperformed EpiSemoLLM regarding EZ localizations in the frontal and occipital lobes and the cingulate cortex. The models' superior performance compared to the foundational model underscores the effectiveness of fine-tuning LLMs with high-quality, domain-specific samples.
EpiSemoLLM:基于癫痫发作语义学的致痫区定位微调大语言模型,性能可与癫痫专家媲美
意义重大:癫痫发作符号学是对癫痫发作时的体征和临床表现的研究,它为推断致痫区(EZ)的位置提供了关键信息。考虑到癫痫发作符号学的描述性质以及大型语言模型(LLMs)的最新进展,利用 LLMs 解释癫痫发作符号学并将其描述映射到相应的 EZ 上,有可能提高 EZ 定位的准确性。本研究介绍了癫痫语义学大型语言模型(或称 EpiSemoLLM),这是首个专门为此目的设计的微调 LLM,建立在 Mistral-7B 基础模型之上:方法:从 189 篇出版物中收集了 865 个病例,每个病例都包含癫痫发作符号学描述,并通过颅内脑电图记录和术后手术结果与经过验证的 EZ 配对。这些收集到的癫痫发作符号学描述和 EZs 数据群作为高质量的特定领域数据,用于微调基础 LLM,以提高其预测最可能 EZs 的能力。为了评估经过微调的 EpiSemoLLM 的性能,我们对 100 个定义明确的病例进行了测试,将 EpiSemoLLM 的反应与 5 位癫痫专家的反应进行了比较。这些反应采用整编可靠性评分 (rRS) 和区域准确率 (RAR) 进行分级。此外,还将 EpiSemoLLM 的性能与其基础模型 Mistral-7B 以及 ChatGPT 和 Llama 的不同版本进行了比较:在与癫痫专家小组的比较中,EpiSemoLLM 在零射击提示下的区域准确率 (RAR) 达到了以下分数:额叶为 60.71%,颞叶为 83.33%,枕叶为 63.16%,顶叶为 45.83%,岛叶皮层为 33.33%,扣带回皮层为 28.57%;平均矫正可靠性评分 (rRS) 为 0.291。相比之下,癫痫专家的平均 RAR 得分为:额叶 64.83%,颞叶 52.22%,枕叶 60.00%,顶叶 42.50%,岛叶皮层 46.00%,扣带回皮层 8.57%;整流可靠性评分 (rRS) 平均值为 0.148。值得注意的是,经过微调的 EpiSemoLLM 的表现优于其基础 LLM、Mistral-7B-instruct 以及 ChatGPT 和 Llama 的各种版本,尤其是在岛叶皮层和扣带回皮层的 EZ 定位方面。EpiSemoLLM 可根据癫痫发作的半身像确定最有可能的 EZ 位置,从而为手术前评估提供有价值的信息:结论:EpiSemoLLM 在根据患者的癫痫发作符号学推断 EZ 方面表现出与癫痫专家不相上下的性能,凸显了其在癫痫手术前评估中的价值。EpiSemoLLM 在解释源自颞叶和顶叶以及岛叶皮层的 EZ 的癫痫发作符号学方面的表现优于癫痫专家。相反,在额叶、枕叶和扣带回皮层的 EZ 定位方面,癫痫专家的表现优于 EpiSemoLLM。与基础模型相比,这些模型的性能更优越,这凸显了利用高质量、特定领域样本对 LLM 进行微调的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信