Marta Fernandes , Aidan Cardall , Lidia MVR Moura , Christopher McGraw , Sahar F. Zafar , M.Brandon Westover
{"title":"Extracting seizure control metrics from clinic notes of patients with epilepsy: A natural language processing approach","authors":"Marta Fernandes , Aidan Cardall , Lidia MVR Moura , Christopher McGraw , Sahar F. Zafar , M.Brandon Westover","doi":"10.1016/j.eplepsyres.2024.107451","DOIUrl":null,"url":null,"abstract":"<div><h3>Objectives</h3><p>Monitoring seizure control metrics is key to clinical care of patients with epilepsy. Manually abstracting these metrics from unstructured text in electronic health records (EHR) is laborious. We aimed to abstract the date of last seizure and seizure frequency from clinical notes of patients with epilepsy using natural language processing (NLP).</p></div><div><h3>Methods</h3><p>We extracted seizure control metrics from notes of patients seen in epilepsy clinics from two hospitals in Boston. Extraction was performed with the pretrained model RoBERTa_for_seizureFrequency_QA, for both date of last seizure and seizure frequency, combined with regular expressions. We designed the algorithm to categorize the timing of last seizure (“today”, “1–6 days ago”, “1–4 weeks ago”, “more than 1–3 months ago”, “more than 3–6 months ago”, “more than 6–12 months ago”, “more than 1–2 years ago”, “more than 2 years ago”) and seizure frequency (“innumerable”, “multiple”, “daily”, “weekly”, “monthly”, “once per year”, “less than once per year”). Our ground truth consisted of structured questionnaires filled out by physicians. Model performance was measured using the areas under the receiving operating characteristic curve (AUROC) and precision recall curve (AUPRC) for categorical labels, and median absolute error (MAE) for ordinal labels, with 95 % confidence intervals (CI) estimated via bootstrapping.</p></div><div><h3>Results</h3><p>Our cohort included 1773 adult patients with a total of 5658 visits with reported seizure control metrics, seen in epilepsy clinics between December 2018 and May 2022. The cohort average age was 42 years old, the majority were female (57 %), White (81 %) and non-Hispanic (85 %). The models achieved an MAE (95 % CI) for date of last seizure of 4 (4.00–4.86) weeks, and for seizure frequency of 0.02 (0.02–0.02) seizures per day.</p></div><div><h3>Conclusions</h3><p>Our NLP approach demonstrates that the extraction of seizure control metrics from EHR is feasible allowing for large-scale EHR research.</p></div>","PeriodicalId":11914,"journal":{"name":"Epilepsy Research","volume":"207 ","pages":"Article 107451"},"PeriodicalIF":2.0000,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Epilepsy Research","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0920121124001669","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Objectives
Monitoring seizure control metrics is key to clinical care of patients with epilepsy. Manually abstracting these metrics from unstructured text in electronic health records (EHR) is laborious. We aimed to abstract the date of last seizure and seizure frequency from clinical notes of patients with epilepsy using natural language processing (NLP).
Methods
We extracted seizure control metrics from notes of patients seen in epilepsy clinics from two hospitals in Boston. Extraction was performed with the pretrained model RoBERTa_for_seizureFrequency_QA, for both date of last seizure and seizure frequency, combined with regular expressions. We designed the algorithm to categorize the timing of last seizure (“today”, “1–6 days ago”, “1–4 weeks ago”, “more than 1–3 months ago”, “more than 3–6 months ago”, “more than 6–12 months ago”, “more than 1–2 years ago”, “more than 2 years ago”) and seizure frequency (“innumerable”, “multiple”, “daily”, “weekly”, “monthly”, “once per year”, “less than once per year”). Our ground truth consisted of structured questionnaires filled out by physicians. Model performance was measured using the areas under the receiving operating characteristic curve (AUROC) and precision recall curve (AUPRC) for categorical labels, and median absolute error (MAE) for ordinal labels, with 95 % confidence intervals (CI) estimated via bootstrapping.
Results
Our cohort included 1773 adult patients with a total of 5658 visits with reported seizure control metrics, seen in epilepsy clinics between December 2018 and May 2022. The cohort average age was 42 years old, the majority were female (57 %), White (81 %) and non-Hispanic (85 %). The models achieved an MAE (95 % CI) for date of last seizure of 4 (4.00–4.86) weeks, and for seizure frequency of 0.02 (0.02–0.02) seizures per day.
Conclusions
Our NLP approach demonstrates that the extraction of seizure control metrics from EHR is feasible allowing for large-scale EHR research.
期刊介绍:
Epilepsy Research provides for publication of high quality articles in both basic and clinical epilepsy research, with a special emphasis on translational research that ultimately relates to epilepsy as a human condition. The journal is intended to provide a forum for reporting the best and most rigorous epilepsy research from all disciplines ranging from biophysics and molecular biology to epidemiological and psychosocial research. As such the journal will publish original papers relevant to epilepsy from any scientific discipline and also studies of a multidisciplinary nature. Clinical and experimental research papers adopting fresh conceptual approaches to the study of epilepsy and its treatment are encouraged. The overriding criteria for publication are novelty, significant clinical or experimental relevance, and interest to a multidisciplinary audience in the broad arena of epilepsy. Review articles focused on any topic of epilepsy research will also be considered, but only if they present an exceptionally clear synthesis of current knowledge and future directions of a research area, based on a critical assessment of the available data or on hypotheses that are likely to stimulate more critical thinking and further advances in an area of epilepsy research.