能否利用自然语言处理技术在初级保健电子病历中识别痴呆症患者？

IF 5.9 Q1 Computer Science

Journal of Healthcare Informatics Research Pub Date : 2023-01-23 eCollection Date: 2023-03-01 DOI:10.1007/s41666-023-00125-6

Laura C Maclagan, Mohamed Abdalla, Daniel A Harris, Therese A Stukel, Branson Chen, Elisa Candido, Richard H Swartz, Andrea Iaboni, R Liisa Jaakkimainen, Susan E Bronskill

{"title":"能否利用自然语言处理技术在初级保健电子病历中识别痴呆症患者？","authors":"Laura C Maclagan, Mohamed Abdalla, Daniel A Harris, Therese A Stukel, Branson Chen, Elisa Candido, Richard H Swartz, Andrea Iaboni, R Liisa Jaakkimainen, Susan E Bronskill","doi":"10.1007/s41666-023-00125-6","DOIUrl":null,"url":null,"abstract":"Dementia and mild cognitive impairment can be underrecognized in primary care practice and research. Free-text fields in electronic medical records (EMRs) are a rich source of information which might support increased detection and enable a better understanding of populations at risk of dementia. We used natural language processing (NLP) to identify dementia-related features in EMRs and compared the performance of supervised machine learning models to classify patients with dementia. We assembled a cohort of primary care patients aged 66 + years in Ontario, Canada, from EMR notes collected until December 2016: 526 with dementia and 44,148 without dementia. We identified dementia-related features by applying published lists, clinician input, and NLP with word embeddings to free-text progress and consult notes and organized features into thematic groups. Using machine learning models, we compared the performance of features to detect dementia, overall and during time periods relative to dementia case ascertainment in health administrative databases. Over 900 dementia-related features were identified and grouped into eight themes (including symptoms, social, function, cognition). Using notes from all time periods, LASSO had the best performance (F1 score: 77.2%, sensitivity: 71.5%, specificity: 99.8%). Model performance was poor when notes written before case ascertainment were included (F1 score: 14.4%, sensitivity: 8.3%, specificity 99.9%) but improved as later notes were added. While similar models may eventually improve recognition of cognitive issues and dementia in primary care EMRs, our findings suggest that further research is needed to identify which additional EMR components might be useful to promote early detection of dementia.Supplementary information: The online version contains supplementary material available at 10.1007/s41666-023-00125-6.","PeriodicalId":36444,"journal":{"name":"Journal of Healthcare Informatics Research","volume":"7 1","pages":"42-58"},"PeriodicalIF":5.9000,"publicationDate":"2023-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9995630/pdf/","citationCount":"0","resultStr":"{\"title\":\"Can Patients with Dementia Be Identified in Primary Care Electronic Medical Records Using Natural Language Processing?\",\"authors\":\"Laura C Maclagan, Mohamed Abdalla, Daniel A Harris, Therese A Stukel, Branson Chen, Elisa Candido, Richard H Swartz, Andrea Iaboni, R Liisa Jaakkimainen, Susan E Bronskill\",\"doi\":\"10.1007/s41666-023-00125-6\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Dementia and mild cognitive impairment can be underrecognized in primary care practice and research. Free-text fields in electronic medical records (EMRs) are a rich source of information which might support increased detection and enable a better understanding of populations at risk of dementia. We used natural language processing (NLP) to identify dementia-related features in EMRs and compared the performance of supervised machine learning models to classify patients with dementia. We assembled a cohort of primary care patients aged 66 + years in Ontario, Canada, from EMR notes collected until December 2016: 526 with dementia and 44,148 without dementia. We identified dementia-related features by applying published lists, clinician input, and NLP with word embeddings to free-text progress and consult notes and organized features into thematic groups. Using machine learning models, we compared the performance of features to detect dementia, overall and during time periods relative to dementia case ascertainment in health administrative databases. Over 900 dementia-related features were identified and grouped into eight themes (including symptoms, social, function, cognition). Using notes from all time periods, LASSO had the best performance (F1 score: 77.2%, sensitivity: 71.5%, specificity: 99.8%). Model performance was poor when notes written before case ascertainment were included (F1 score: 14.4%, sensitivity: 8.3%, specificity 99.9%) but improved as later notes were added. While similar models may eventually improve recognition of cognitive issues and dementia in primary care EMRs, our findings suggest that further research is needed to identify which additional EMR components might be useful to promote early detection of dementia.Supplementary information: The online version contains supplementary material available at 10.1007/s41666-023-00125-6.\",\"PeriodicalId\":36444,\"journal\":{\"name\":\"Journal of Healthcare Informatics Research\",\"volume\":\"7 1\",\"pages\":\"42-58\"},\"PeriodicalIF\":5.9000,\"publicationDate\":\"2023-01-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9995630/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Healthcare Informatics Research\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s41666-023-00125-6\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2023/3/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q1\",\"JCRName\":\"Computer Science\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Healthcare Informatics Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s41666-023-00125-6","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/3/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"Computer Science","Score":null,"Total":0}

引用次数: 0

摘要

在初级保健实践和研究中，痴呆症和轻度认知障碍可能未得到充分认识。电子病历（EMR）中的自由文本字段是一个丰富的信息来源，可以帮助提高检测率，并更好地了解有痴呆风险的人群。我们使用自然语言处理（NLP）来识别电子病历中与痴呆症相关的特征，并比较了有监督机器学习模型对痴呆症患者进行分类的性能。我们从截至 2016 年 12 月收集的 EMR 记录中收集了加拿大安大略省 66 岁以上的初级保健患者队列：其中 526 人患有痴呆症，44148 人未患有痴呆症。我们将已发表的清单、临床医生的输入以及带有单词嵌入的 NLP 应用于自由文本的进展和咨询笔记，从而确定了与痴呆症相关的特征，并将特征组织成主题组。利用机器学习模型，我们比较了这些特征在总体上以及在与健康管理数据库中痴呆症病例确定相关的时间段内检测痴呆症的性能。我们确定了 900 多个痴呆症相关特征，并将其分为八个主题（包括症状、社交、功能、认知）。使用所有时间段的笔记，LASSO 的性能最佳（F1 分数：77.2%，灵敏度：71.5%，特异性：99.8%）。如果纳入病例确定前的笔记，模型性能较差（F1 得分：14.4%，灵敏度：8.3%，特异性：99.9%），但随着后期笔记的加入，性能有所改善。虽然类似的模型最终可能会提高初级医疗电子病历对认知问题和痴呆症的识别率，但我们的研究结果表明，还需要进一步研究，以确定哪些额外的电子病历组件可能有助于促进痴呆症的早期发现：在线版本包含补充材料，可查阅 10.1007/s41666-023-00125-6。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Can Patients with Dementia Be Identified in Primary Care Electronic Medical Records Using Natural Language Processing?

Dementia and mild cognitive impairment can be underrecognized in primary care practice and research. Free-text fields in electronic medical records (EMRs) are a rich source of information which might support increased detection and enable a better understanding of populations at risk of dementia. We used natural language processing (NLP) to identify dementia-related features in EMRs and compared the performance of supervised machine learning models to classify patients with dementia. We assembled a cohort of primary care patients aged 66 + years in Ontario, Canada, from EMR notes collected until December 2016: 526 with dementia and 44,148 without dementia. We identified dementia-related features by applying published lists, clinician input, and NLP with word embeddings to free-text progress and consult notes and organized features into thematic groups. Using machine learning models, we compared the performance of features to detect dementia, overall and during time periods relative to dementia case ascertainment in health administrative databases. Over 900 dementia-related features were identified and grouped into eight themes (including symptoms, social, function, cognition). Using notes from all time periods, LASSO had the best performance (F1 score: 77.2%, sensitivity: 71.5%, specificity: 99.8%). Model performance was poor when notes written before case ascertainment were included (F1 score: 14.4%, sensitivity: 8.3%, specificity 99.9%) but improved as later notes were added. While similar models may eventually improve recognition of cognitive issues and dementia in primary care EMRs, our findings suggest that further research is needed to identify which additional EMR components might be useful to promote early detection of dementia.

Supplementary information: The online version contains supplementary material available at 10.1007/s41666-023-00125-6.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Healthcare Informatics Research Computer Science-Computer Science Applications

CiteScore

13.60

自引率

1.70%

发文量

期刊介绍： Journal of Healthcare Informatics Research serves as a publication venue for the innovative technical contributions highlighting analytics, systems, and human factors research in healthcare informatics.Journal of Healthcare Informatics Research is concerned with the application of computer science principles, information science principles, information technology, and communication technology to address problems in healthcare, and everyday wellness. Journal of Healthcare Informatics Research highlights the most cutting-edge technical contributions in computing-oriented healthcare informatics. The journal covers three major tracks: (1) analytics—focuses on data analytics, knowledge discovery, predictive modeling; (2) systems—focuses on building healthcare informatics systems (e.g., architecture, framework, design, engineering, and application); (3) human factors—focuses on understanding users or context, interface design, health behavior, and user studies of healthcare informatics applications. Topics include but are not limited to: · healthcare software architecture, framework, design, and engineering;· electronic health records· medical data mining· predictive modeling· medical information retrieval· medical natural language processing· healthcare information systems· smart health and connected health· social media analytics· mobile healthcare· medical signal processing· human factors in healthcare· usability studies in healthcare· user-interface design for medical devices and healthcare software· health service delivery· health games· security and privacy in healthcare· medical recommender system· healthcare workflow management· disease profiling and personalized treatment· visualization of medical data· intelligent medical devices and sensors· RFID solutions for healthcare· healthcare decision analytics and support systems· epidemiological surveillance systems and intervention modeling· consumer and clinician health information needs, seeking, sharing, and use· semantic Web, linked data, and ontology· collaboration technologies for healthcare· assistive and adaptive ubiquitous computing technologies· statistics and quality of medical data· healthcare delivery in developing countries· health systems modeling and simulation· computer-aided diagnosis