Identifying long-term conditions in New Zealand general practice using structured and unstructured data: a cross-sectional study.

IF 4.4 Q1 HEALTH CARE SCIENCES & SERVICES

BMJ Health & Care Informatics Pub Date : 2025-05-22 DOI:10.1136/bmjhci-2024-101393

Yeunhyang Catherine Choi, Katrina Poppe, Vanessa Selak, Allan Ronald Moffitt, Claris Yee Seung Chung, Jane Ullmer, Sue Wells

{"title":"Identifying long-term conditions in New Zealand general practice using structured and unstructured data: a cross-sectional study.","authors":"Yeunhyang Catherine Choi, Katrina Poppe, Vanessa Selak, Allan Ronald Moffitt, Claris Yee Seung Chung, Jane Ullmer, Sue Wells","doi":"10.1136/bmjhci-2024-101393","DOIUrl":null,"url":null,"abstract":"Objectives: This study examined whether incorporating free-text entries into structured general practice records improves the detection of long-term conditions (LTCs) and multimorbidity (MM) in New Zealand (NZ) general practices.Methods: Data from 374 071 deidentified individuals in general practices were analysed to identify 61 LTCs. Structured data were extracted using Read codes from a national master list, and clinical raters independently identified condition-related free-text, including synonyms, negation terms and common misspellings in randomised samples. Keywords were categorised and refined through ten iterative tests. Programmatic text classification was developed and assessed against gold-standard clinician ratings, using sensitivity, specificity, positive predictive value (PPV) and F1-score.Results: A quarter of general practitioner classifications contained either unrecognised Read codes or consisted of free-text only. Clinician inter-rater reliability was high (kappa ≥0.9). Compared with clinical gold standard, text classification yielded an average sensitivity of 88%, specificity of 99% and PPV of 95%, with an F1-score range of 82%-95%. Incorporating free text increased LTC prevalence from 42.1% to 46.3%, reducing misclassification of MM diagnoses by identifying 12 626 additional patients with MM and 15 972 additional patients with at least one LTC.Discussion: In the course of workflow, general practitioners face barriers to accurate LTC coding or may simply annotate with text-based descriptions. Programmatic text classification has demonstrated high performance and identified many more patients receiving LTC care.Conclusions: Combining structured and unstructured data optimises MM detection in NZ general practices and has the potential to improve case management, follow-up care and allocation of healthcare resources.","PeriodicalId":9050,"journal":{"name":"BMJ Health & Care Informatics","volume":"32 1","pages":""},"PeriodicalIF":4.4000,"publicationDate":"2025-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12104881/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMJ Health & Care Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1136/bmjhci-2024-101393","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}

引用次数: 0

Abstract

Objectives: This study examined whether incorporating free-text entries into structured general practice records improves the detection of long-term conditions (LTCs) and multimorbidity (MM) in New Zealand (NZ) general practices.

Methods: Data from 374 071 deidentified individuals in general practices were analysed to identify 61 LTCs. Structured data were extracted using Read codes from a national master list, and clinical raters independently identified condition-related free-text, including synonyms, negation terms and common misspellings in randomised samples. Keywords were categorised and refined through ten iterative tests. Programmatic text classification was developed and assessed against gold-standard clinician ratings, using sensitivity, specificity, positive predictive value (PPV) and F₁-score.

Results: A quarter of general practitioner classifications contained either unrecognised Read codes or consisted of free-text only. Clinician inter-rater reliability was high (kappa ≥0.9). Compared with clinical gold standard, text classification yielded an average sensitivity of 88%, specificity of 99% and PPV of 95%, with an F₁-score range of 82%-95%. Incorporating free text increased LTC prevalence from 42.1% to 46.3%, reducing misclassification of MM diagnoses by identifying 12 626 additional patients with MM and 15 972 additional patients with at least one LTC.

Discussion: In the course of workflow, general practitioners face barriers to accurate LTC coding or may simply annotate with text-based descriptions. Programmatic text classification has demonstrated high performance and identified many more patients receiving LTC care.

Conclusions: Combining structured and unstructured data optimises MM detection in NZ general practices and has the potential to improve case management, follow-up care and allocation of healthcare resources.

Abstract Image

查看原文本刊更多论文

使用结构化和非结构化数据确定新西兰全科实践的长期条件：一项横断面研究。

目的：本研究考察了将自由文本条目纳入结构化全科医生记录中是否可以改善新西兰全科医生对长期疾病（LTCs）和多病（MM）的检测。方法：分析了来自374071名全科医生的资料，确定了61个LTCs。使用Read代码从国家主列表中提取结构化数据，临床评分员独立识别随机样本中与病情相关的自由文本，包括同义词、否定术语和常见拼写错误。通过10次迭代测试对关键词进行分类和细化。程序化文本分类被开发出来，并根据金标准临床医生评分，使用敏感性、特异性、阳性预测值（PPV）和f1评分进行评估。结果：四分之一的全科医生分类包含无法识别的读取代码或仅由自由文本组成。临床医师间信度高（kappa≥0.9）。与临床金标准相比，文本分类的平均灵敏度为88%，特异性为99%，PPV为95%，f1评分范围为82%-95%。结合自由文本将LTC的患病率从42.1%提高到46.3%，通过确定12 626名额外的MM患者和15 972名额外的至少有一种LTC的患者，减少了MM诊断的错误分类。讨论：在工作流程的过程中，全科医生面临着准确LTC编码的障碍，或者可能只是简单地用基于文本的描述进行注释。程序性文本分类显示出高性能，并识别出更多接受LTC护理的患者。结论：结合结构化和非结构化数据优化了新西兰普通医疗实践中的MM检测，并有可能改善病例管理、随访护理和医疗资源分配。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊