Improving Phenotyping of Patients With Immune-Mediated Inflammatory Diseases Through Automated Processing of Discharge Summaries: Multicenter Cohort Study.
Adam Remaki, Jacques Ung, Pierre Pages, Perceval Wajsburt, Elise Liu, Guillaume Faure, Thomas Petit-Jean, Xavier Tannier, Christel Gérardin
{"title":"Improving Phenotyping of Patients With Immune-Mediated Inflammatory Diseases Through Automated Processing of Discharge Summaries: Multicenter Cohort Study.","authors":"Adam Remaki, Jacques Ung, Pierre Pages, Perceval Wajsburt, Elise Liu, Guillaume Faure, Thomas Petit-Jean, Xavier Tannier, Christel Gérardin","doi":"10.2196/68704","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Valuable insights gathered by clinicians during their inquiries and documented in textual reports are often unavailable in the structured data recorded in electronic health records (EHRs).</p><p><strong>Objective: </strong>This study aimed to highlight that mining unstructured textual data with natural language processing techniques complements the available structured data and enables more comprehensive patient phenotyping. A proof-of-concept for patients diagnosed with specific autoimmune diseases is presented, in which the extraction of information on laboratory tests and drug treatments is performed.</p><p><strong>Methods: </strong>We collected EHRs available in the clinical data warehouse of the Greater Paris University Hospitals from 2012 to 2021 for patients hospitalized and diagnosed with 1 of 4 immune-mediated inflammatory diseases: systemic lupus erythematosus, systemic sclerosis, antiphospholipid syndrome, and Takayasu arteritis. Then, we built, trained, and validated natural language processing algorithms on 103 discharge summaries selected from the cohort and annotated by a clinician. Finally, all discharge summaries in the cohort were processed with the algorithms, and the extracted data on laboratory tests and drug treatments were compared with the structured data.</p><p><strong>Results: </strong>Named entity recognition followed by normalization yielded F<sub>1</sub>-scores of 71.1 (95% CI 63.6-77.8) for the laboratory tests and 89.3 (95% CI 85.9-91.6) for the drugs. Application of the algorithms to 18,604 EHRs increased the detection of antibody results and drug treatments. For instance, among patients in the systemic lupus erythematosus cohort with positive antinuclear antibodies, the rate increased from 18.34% (752/4102) to 71.87% (2949/4102), making the results more consistent with the literature.</p><p><strong>Conclusions: </strong>While challenges remain in standardizing laboratory tests, particularly with abbreviations, this work, based on secondary use of clinical data, demonstrates that automated processing of discharge summaries enriched the information available in structured data and facilitated more comprehensive patient profiling.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e68704"},"PeriodicalIF":3.8000,"publicationDate":"2025-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12018854/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2196/68704","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Valuable insights gathered by clinicians during their inquiries and documented in textual reports are often unavailable in the structured data recorded in electronic health records (EHRs).
Objective: This study aimed to highlight that mining unstructured textual data with natural language processing techniques complements the available structured data and enables more comprehensive patient phenotyping. A proof-of-concept for patients diagnosed with specific autoimmune diseases is presented, in which the extraction of information on laboratory tests and drug treatments is performed.
Methods: We collected EHRs available in the clinical data warehouse of the Greater Paris University Hospitals from 2012 to 2021 for patients hospitalized and diagnosed with 1 of 4 immune-mediated inflammatory diseases: systemic lupus erythematosus, systemic sclerosis, antiphospholipid syndrome, and Takayasu arteritis. Then, we built, trained, and validated natural language processing algorithms on 103 discharge summaries selected from the cohort and annotated by a clinician. Finally, all discharge summaries in the cohort were processed with the algorithms, and the extracted data on laboratory tests and drug treatments were compared with the structured data.
Results: Named entity recognition followed by normalization yielded F1-scores of 71.1 (95% CI 63.6-77.8) for the laboratory tests and 89.3 (95% CI 85.9-91.6) for the drugs. Application of the algorithms to 18,604 EHRs increased the detection of antibody results and drug treatments. For instance, among patients in the systemic lupus erythematosus cohort with positive antinuclear antibodies, the rate increased from 18.34% (752/4102) to 71.87% (2949/4102), making the results more consistent with the literature.
Conclusions: While challenges remain in standardizing laboratory tests, particularly with abbreviations, this work, based on secondary use of clinical data, demonstrates that automated processing of discharge summaries enriched the information available in structured data and facilitated more comprehensive patient profiling.
背景:临床医生在调查过程中收集的有价值的见解并记录在文本报告中,通常无法在电子健康记录(EHRs)中记录的结构化数据中获得。目的:本研究旨在强调使用自然语言处理技术挖掘非结构化文本数据是对现有结构化数据的补充,并使患者表型更全面。提出了对诊断为特定自身免疫性疾病的患者进行概念验证,其中提取了有关实验室测试和药物治疗的信息。方法:收集2012年至2021年大巴黎大学医院临床数据仓库中诊断为系统性红斑狼疮、系统性硬化症、抗磷脂综合征和高松动脉炎等4种免疫介导炎症性疾病中1种的住院患者的电子病历。然后,我们建立、训练并验证了自然语言处理算法,这些算法是从队列中选择的103个出院摘要,并由临床医生注释。最后,对队列中所有出院摘要进行算法处理,并将提取的实验室检查和药物治疗数据与结构化数据进行比较。结果:命名实体识别后进行归一化,实验室检查的f1得分为71.1分(95% CI 63.6-77.8),药物的f1得分为89.3分(95% CI 85.9-91.6)。将该算法应用于18604份电子病历,提高了抗体结果的检测和药物治疗。例如,在系统性红斑狼疮患者中,抗核抗体阳性的患者比例从18.34%(752/4102)增加到71.87%(2949/4102),使结果与文献更加一致。结论:虽然在标准化实验室测试方面仍存在挑战,特别是在缩写方面,但这项基于临床数据二次使用的工作表明,出院摘要的自动化处理丰富了结构化数据中可用的信息,并促进了更全面的患者概况。
期刊介绍:
JMIR Medical Informatics (JMI, ISSN 2291-9694) is a top-rated, tier A journal which focuses on clinical informatics, big data in health and health care, decision support for health professionals, electronic health records, ehealth infrastructures and implementation. It has a focus on applied, translational research, with a broad readership including clinicians, CIOs, engineers, industry and health informatics professionals.
Published by JMIR Publications, publisher of the Journal of Medical Internet Research (JMIR), the leading eHealth/mHealth journal (Impact Factor 2016: 5.175), JMIR Med Inform has a slightly different scope (emphasizing more on applications for clinicians and health professionals rather than consumers/citizens, which is the focus of JMIR), publishes even faster, and also allows papers which are more technical or more formative than what would be published in the Journal of Medical Internet Research.