Belay Birlie Yimer, Fangyuan Zhang, Jenny Humphreys, Mark Lunt, Meghna Jani, John McBeth, William G Dixon
{"title":"通过连接初级和二级保健电子健康记录改善疾病错误分类和患病率估计:来自关节炎研究的例证。","authors":"Belay Birlie Yimer, Fangyuan Zhang, Jenny Humphreys, Mark Lunt, Meghna Jani, John McBeth, William G Dixon","doi":"10.1093/aje/kwaf206","DOIUrl":null,"url":null,"abstract":"<p><p>Prevalence estimates using primary care data health identify cases via code lists. Validation studies can discover and exclude false positives, but it is often difficult or impossible to find false negatives. This study aimed, using the example of psoriatic arthritis (PsA), to examine the extent of and adjust for misclassification by linking primary care records with text-mined outpatient letters from a North-West regional hospital (2014-2019). 245 cases of PsA were identified among 188,286 adults registered with primary care, giving an observed prevalence of 0.13% [95%CI 0.11%-0.15%]. Among a subgroup of 7,532 primary care patients attending the hospital rheumatology clinic, 202 had a primary care PsA code: 188 were confirmed as true PsA, while 14 were false positives. Primary care codes failed to identify 196 hospital-diagnosed PsA cases, leading to a more than two-fold underestimation. The adjusted prevalence, accounting for misclassification, was 0.25% [95% CI 0.21%-0.28%]. Linking primary care with hospital records identified false positives and negatives, enabling correction of prevalence estimates. This highlights the value of text-mining hospital letters to replace the national absence of coded secondary care diagnosis data from outpatient departments, and the importance of considering the impact of false negatives.</p>","PeriodicalId":7472,"journal":{"name":"American journal of epidemiology","volume":" ","pages":""},"PeriodicalIF":4.8000,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Improving disease misclassification and prevalence estimates by linking primary and secondary care electronic health records: an illustration from arthritis research.\",\"authors\":\"Belay Birlie Yimer, Fangyuan Zhang, Jenny Humphreys, Mark Lunt, Meghna Jani, John McBeth, William G Dixon\",\"doi\":\"10.1093/aje/kwaf206\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Prevalence estimates using primary care data health identify cases via code lists. Validation studies can discover and exclude false positives, but it is often difficult or impossible to find false negatives. This study aimed, using the example of psoriatic arthritis (PsA), to examine the extent of and adjust for misclassification by linking primary care records with text-mined outpatient letters from a North-West regional hospital (2014-2019). 245 cases of PsA were identified among 188,286 adults registered with primary care, giving an observed prevalence of 0.13% [95%CI 0.11%-0.15%]. Among a subgroup of 7,532 primary care patients attending the hospital rheumatology clinic, 202 had a primary care PsA code: 188 were confirmed as true PsA, while 14 were false positives. Primary care codes failed to identify 196 hospital-diagnosed PsA cases, leading to a more than two-fold underestimation. The adjusted prevalence, accounting for misclassification, was 0.25% [95% CI 0.21%-0.28%]. Linking primary care with hospital records identified false positives and negatives, enabling correction of prevalence estimates. This highlights the value of text-mining hospital letters to replace the national absence of coded secondary care diagnosis data from outpatient departments, and the importance of considering the impact of false negatives.</p>\",\"PeriodicalId\":7472,\"journal\":{\"name\":\"American journal of epidemiology\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":4.8000,\"publicationDate\":\"2025-09-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"American journal of epidemiology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1093/aje/kwaf206\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"American journal of epidemiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1093/aje/kwaf206","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}
引用次数: 0
摘要
使用初级保健数据的患病率估计通过代码清单确定病例。验证研究可以发现并排除假阳性,但通常很难或不可能发现假阴性。本研究旨在以银屑病关节炎(PsA)为例,通过将初级保健记录与西北地区医院(2014-2019年)的文本挖掘门诊信件联系起来,检查错误分类的程度并进行调整。在188,286名接受初级保健登记的成年人中发现245例PsA,观察到患病率为0.13% [95%CI 0.11%-0.15%]。在参加医院风湿病门诊的7532名初级保健患者的亚组中,202名患者具有初级保健PsA代码:188名被确认为真实PsA, 14名被确认为假阳性。初级保健代码未能识别出196例医院诊断的PsA病例,导致低估了两倍以上。校正后的误分类患病率为0.25% [95% CI 0.21%-0.28%]。将初级保健与医院记录联系起来,发现了假阳性和假阴性,从而能够纠正患病率估计数。这突出了文本挖掘医院信函的价值,以取代全国门诊部门编码二级保健诊断数据的缺失,以及考虑假阴性影响的重要性。
Improving disease misclassification and prevalence estimates by linking primary and secondary care electronic health records: an illustration from arthritis research.
Prevalence estimates using primary care data health identify cases via code lists. Validation studies can discover and exclude false positives, but it is often difficult or impossible to find false negatives. This study aimed, using the example of psoriatic arthritis (PsA), to examine the extent of and adjust for misclassification by linking primary care records with text-mined outpatient letters from a North-West regional hospital (2014-2019). 245 cases of PsA were identified among 188,286 adults registered with primary care, giving an observed prevalence of 0.13% [95%CI 0.11%-0.15%]. Among a subgroup of 7,532 primary care patients attending the hospital rheumatology clinic, 202 had a primary care PsA code: 188 were confirmed as true PsA, while 14 were false positives. Primary care codes failed to identify 196 hospital-diagnosed PsA cases, leading to a more than two-fold underestimation. The adjusted prevalence, accounting for misclassification, was 0.25% [95% CI 0.21%-0.28%]. Linking primary care with hospital records identified false positives and negatives, enabling correction of prevalence estimates. This highlights the value of text-mining hospital letters to replace the national absence of coded secondary care diagnosis data from outpatient departments, and the importance of considering the impact of false negatives.
期刊介绍:
The American Journal of Epidemiology is the oldest and one of the premier epidemiologic journals devoted to the publication of empirical research findings, opinion pieces, and methodological developments in the field of epidemiologic research.
It is a peer-reviewed journal aimed at both fellow epidemiologists and those who use epidemiologic data, including public health workers and clinicians.