Jitse Loyens, Geertruida Slinger, Nynke Doornebal, Kees P J Braun, Eric van Diessen, Willem M Otte
{"title":"AI language model applications for early diagnosis of childhood epilepsy based on unstructured first-visit patient narratives: A cohort study.","authors":"Jitse Loyens, Geertruida Slinger, Nynke Doornebal, Kees P J Braun, Eric van Diessen, Willem M Otte","doi":"10.1002/epd2.70109","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>Language serves as an indispensable source of information for diagnosing epilepsy, and its computational analysis is increasingly explored. This study assessed - and compared - the diagnostic value of different language model applications in extracting information. The aim is to identify language patterns that may contain useful clinical information that is not overtly considered by the clinician from first-visit documentation to improve the early diagnosis of childhood epilepsy.</p><p><strong>Methods: </strong>We analyzed 1561 patient letters from the first two seizure clinics. The dataset was divided into training and test sets to evaluate performance and generalizability. We employed an established Naïve Bayes model as a natural language processing technique and a sentence-embedding (large language) model based on the Bidirectional Encoder Representations from Transformers (BERT) architecture. Both models analyzed anamnesis texts as noted by the treating physician only. Within the training sets, we identified predictive features consisting of keywords indicative of 'epilepsy' or 'no epilepsy.' Model outputs were compared to the clinician's final diagnosis (gold standard) after a two-year follow-up period. We computed accuracy, sensitivity, and specificity for both models.</p><p><strong>Results: </strong>The Naïve Bayes model achieved an accuracy of 0.73 (95% CI: 0.68-0.78), with a sensitivity of 0.79 (95% CI: 0.74-0.85) and a specificity of 0.62 (95% CI: 0.52-0.72). The sentence-embedding model demonstrated comparable performance with an accuracy of 0.74 (95% CI: 0.68-0.79), a sensitivity of 0.74 (95% CI: 0.68-0.80), and a specificity of 0.73 (95% CI: 0.61-0.84).</p><p><strong>Significance: </strong>Both models demonstrated relatively good performance in diagnosing childhood epilepsy solely based on the first-visit patient anamnesis text. Notably, the more advanced sentence-embedding model showed no improvement over the computationally simpler Naïve Bayes model. This suggests that modeling of anamnesis data does depend on word order for this particular classification task. Further refinement and exploration of language models and computational linguistic approaches are necessary to enhance diagnostic accuracy in clinical practice.</p>","PeriodicalId":50508,"journal":{"name":"Epileptic Disorders","volume":" ","pages":""},"PeriodicalIF":2.7000,"publicationDate":"2025-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Epileptic Disorders","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/epd2.70109","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Objective: Language serves as an indispensable source of information for diagnosing epilepsy, and its computational analysis is increasingly explored. This study assessed - and compared - the diagnostic value of different language model applications in extracting information. The aim is to identify language patterns that may contain useful clinical information that is not overtly considered by the clinician from first-visit documentation to improve the early diagnosis of childhood epilepsy.
Methods: We analyzed 1561 patient letters from the first two seizure clinics. The dataset was divided into training and test sets to evaluate performance and generalizability. We employed an established Naïve Bayes model as a natural language processing technique and a sentence-embedding (large language) model based on the Bidirectional Encoder Representations from Transformers (BERT) architecture. Both models analyzed anamnesis texts as noted by the treating physician only. Within the training sets, we identified predictive features consisting of keywords indicative of 'epilepsy' or 'no epilepsy.' Model outputs were compared to the clinician's final diagnosis (gold standard) after a two-year follow-up period. We computed accuracy, sensitivity, and specificity for both models.
Results: The Naïve Bayes model achieved an accuracy of 0.73 (95% CI: 0.68-0.78), with a sensitivity of 0.79 (95% CI: 0.74-0.85) and a specificity of 0.62 (95% CI: 0.52-0.72). The sentence-embedding model demonstrated comparable performance with an accuracy of 0.74 (95% CI: 0.68-0.79), a sensitivity of 0.74 (95% CI: 0.68-0.80), and a specificity of 0.73 (95% CI: 0.61-0.84).
Significance: Both models demonstrated relatively good performance in diagnosing childhood epilepsy solely based on the first-visit patient anamnesis text. Notably, the more advanced sentence-embedding model showed no improvement over the computationally simpler Naïve Bayes model. This suggests that modeling of anamnesis data does depend on word order for this particular classification task. Further refinement and exploration of language models and computational linguistic approaches are necessary to enhance diagnostic accuracy in clinical practice.
期刊介绍:
Epileptic Disorders is the leading forum where all experts and medical studentswho wish to improve their understanding of epilepsy and related disorders can share practical experiences surrounding diagnosis and care, natural history, and management of seizures.
Epileptic Disorders is the official E-journal of the International League Against Epilepsy for educational communication. As the journal celebrates its 20th anniversary, it will now be available only as an online version. Its mission is to create educational links between epileptologists and other health professionals in clinical practice and scientists or physicians in research-based institutions. This change is accompanied by an increase in the number of issues per year, from 4 to 6, to ensure regular diffusion of recently published material (high quality Review and Seminar in Epileptology papers; Original Research articles or Case reports of educational value; MultiMedia Teaching Material), to serve the global medical community that cares for those affected by epilepsy.