Regina Prigge, Kelly J Fleetwood, Caroline A Jackson, Stewart W Mercer, Paul At Kelly, Cathie Sudlow, John D Norrie, Daniel R Morales, Daniel J Smith, Bruce Guthrie
{"title":"使用不同的关联数据集稳健地测量多重发病率。","authors":"Regina Prigge, Kelly J Fleetwood, Caroline A Jackson, Stewart W Mercer, Paul At Kelly, Cathie Sudlow, John D Norrie, Daniel R Morales, Daniel J Smith, Bruce Guthrie","doi":"10.1038/s43856-025-00995-4","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Measurement of multimorbidity, the co-occurrence of two or more conditions in the same individual, is highly variable which limits the consistency and reproducibility of research.</p><p><strong>Methods: </strong>Using data from 172,563 UK Biobank (UKB) participants and a cross-sectional approach, we examined how choice of data source affected estimated prevalence of 80 individual long-term conditions (LTCs) and multimorbidity. We developed code-list-based algorithms to determine the prevalence of 80 LTCs in (1) primary care records, (2) UKB baseline assessment, (3) hospital/cancer registry records, and (4) all three data sources together.</p><p><strong>Results: </strong>Using records from all three data sources, 146,811 (85.1%) participants have at least one and 109,609 (63.5%) have at least two LTCs at baseline. A median of 4.7% (IQR 1.0-16.6) of participants with a condition are identified by all three data sources. Agreement is highest for endocrine, nutritional and metabolic disorders, with a median of 32.9% (IQR 20.5-34.1) of individuals with a condition identified by all three data sources. Agreement is lowest for diseases of the genitourinary system and mental and behavioural disorders where perfect agreement varies from zero to 4.9% and zero to 12.3% across conditions, respectively. The low agreement between data sources is accompanied by high proportions of individuals with a condition identified only in primary care data (i.e. not in either of the other two sources), with a median of 59.3% (IQR 47.4-75.9) for diseases of the genitourinary system and 66.9% (IQR 42.8-79.2) for mental and behavioural disorders.</p><p><strong>Conclusions: </strong>Our study highlights the impact of the choice of which data source is used in research on individual LTCs and multimorbidity, and the importance of clearly justifying choices made.</p>","PeriodicalId":72646,"journal":{"name":"Communications medicine","volume":"5 1","pages":"283"},"PeriodicalIF":5.4000,"publicationDate":"2025-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12238475/pdf/","citationCount":"0","resultStr":"{\"title\":\"Robustly measuring multimorbidity using disparate linked datasets.\",\"authors\":\"Regina Prigge, Kelly J Fleetwood, Caroline A Jackson, Stewart W Mercer, Paul At Kelly, Cathie Sudlow, John D Norrie, Daniel R Morales, Daniel J Smith, Bruce Guthrie\",\"doi\":\"10.1038/s43856-025-00995-4\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Measurement of multimorbidity, the co-occurrence of two or more conditions in the same individual, is highly variable which limits the consistency and reproducibility of research.</p><p><strong>Methods: </strong>Using data from 172,563 UK Biobank (UKB) participants and a cross-sectional approach, we examined how choice of data source affected estimated prevalence of 80 individual long-term conditions (LTCs) and multimorbidity. We developed code-list-based algorithms to determine the prevalence of 80 LTCs in (1) primary care records, (2) UKB baseline assessment, (3) hospital/cancer registry records, and (4) all three data sources together.</p><p><strong>Results: </strong>Using records from all three data sources, 146,811 (85.1%) participants have at least one and 109,609 (63.5%) have at least two LTCs at baseline. A median of 4.7% (IQR 1.0-16.6) of participants with a condition are identified by all three data sources. Agreement is highest for endocrine, nutritional and metabolic disorders, with a median of 32.9% (IQR 20.5-34.1) of individuals with a condition identified by all three data sources. Agreement is lowest for diseases of the genitourinary system and mental and behavioural disorders where perfect agreement varies from zero to 4.9% and zero to 12.3% across conditions, respectively. The low agreement between data sources is accompanied by high proportions of individuals with a condition identified only in primary care data (i.e. not in either of the other two sources), with a median of 59.3% (IQR 47.4-75.9) for diseases of the genitourinary system and 66.9% (IQR 42.8-79.2) for mental and behavioural disorders.</p><p><strong>Conclusions: </strong>Our study highlights the impact of the choice of which data source is used in research on individual LTCs and multimorbidity, and the importance of clearly justifying choices made.</p>\",\"PeriodicalId\":72646,\"journal\":{\"name\":\"Communications medicine\",\"volume\":\"5 1\",\"pages\":\"283\"},\"PeriodicalIF\":5.4000,\"publicationDate\":\"2025-07-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12238475/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Communications medicine\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1038/s43856-025-00995-4\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MEDICINE, RESEARCH & EXPERIMENTAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Communications medicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1038/s43856-025-00995-4","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICINE, RESEARCH & EXPERIMENTAL","Score":null,"Total":0}
Robustly measuring multimorbidity using disparate linked datasets.
Background: Measurement of multimorbidity, the co-occurrence of two or more conditions in the same individual, is highly variable which limits the consistency and reproducibility of research.
Methods: Using data from 172,563 UK Biobank (UKB) participants and a cross-sectional approach, we examined how choice of data source affected estimated prevalence of 80 individual long-term conditions (LTCs) and multimorbidity. We developed code-list-based algorithms to determine the prevalence of 80 LTCs in (1) primary care records, (2) UKB baseline assessment, (3) hospital/cancer registry records, and (4) all three data sources together.
Results: Using records from all three data sources, 146,811 (85.1%) participants have at least one and 109,609 (63.5%) have at least two LTCs at baseline. A median of 4.7% (IQR 1.0-16.6) of participants with a condition are identified by all three data sources. Agreement is highest for endocrine, nutritional and metabolic disorders, with a median of 32.9% (IQR 20.5-34.1) of individuals with a condition identified by all three data sources. Agreement is lowest for diseases of the genitourinary system and mental and behavioural disorders where perfect agreement varies from zero to 4.9% and zero to 12.3% across conditions, respectively. The low agreement between data sources is accompanied by high proportions of individuals with a condition identified only in primary care data (i.e. not in either of the other two sources), with a median of 59.3% (IQR 47.4-75.9) for diseases of the genitourinary system and 66.9% (IQR 42.8-79.2) for mental and behavioural disorders.
Conclusions: Our study highlights the impact of the choice of which data source is used in research on individual LTCs and multimorbidity, and the importance of clearly justifying choices made.