Ann-Sophie Buchardt, Pi Vejsig Madsen, Andreas Jensen
{"title":"数据驱动算法在丹麦国家患者登记的住院和门诊患者分类。","authors":"Ann-Sophie Buchardt, Pi Vejsig Madsen, Andreas Jensen","doi":"10.2147/CLEP.S500800","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>The Danish National Patient Register (DNPR) is an important data source for research providing detailed information on all hospital contacts in Denmark. With the transition from the second version of the DNPR (DNPR2) to the third version (DNPR3) in early 2019, the patient type variable (inpatient, elective outpatient, acute outpatient) was removed. This study proposes and evaluates algorithms to classify hospital contacts into these categories in DNPR3, aiming for consensus in data interpretation for researchers using Danish registries.</p><p><strong>Patients and methods: </strong>We analyzed somatic public hospital contacts in Denmark from 2017 to 2020, with 20,882,018 unique contacts in DNPR2 and 27,694,584 in DNPR3. Several classification algorithms were developed and assessed, including department-based, contact-based, and hybrid methods, to infer patient types in DNPR3 based on contact features, such as duration and admission type. In DNPR3, where the true patient type is unknown, proxy labels were used to train classification algorithms.</p><p><strong>Results: </strong>Compared to the true patient type variable in DNPR2, our department-based classifier showed high positive predictive values (PPVs) and sensitivities in DNPR2 with PPVs ranging from 95.6 to 99.5 and sensitivities ranging from 94.1 to 99.6 across patient types. The hybrid approach showed improved PPVs and sensitivities for acute (PPV = 97.3, sensitivity = 96.8) and elective (PPV = 99.8, sensitivity = 99.9) outpatients. In both DNPR2 and DNPR3 high agreement between contact-based classification algorithms was obtained indicating robustness in our classification methods which suggests the presence of inherent patterns in the data.</p><p><strong>Conclusion: </strong>Our study shows that all presented classification methods are suitable for categorizing patient types in DNPR2 depending on the available data and furthermore demonstrated robustness, supporting their suitability for classification in DNPR3. Future research should explore advanced techniques and comprehensive department classification for enhanced accuracy and applicability.</p>","PeriodicalId":10362,"journal":{"name":"Clinical Epidemiology","volume":"17 ","pages":"147-163"},"PeriodicalIF":3.4000,"publicationDate":"2025-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11853825/pdf/","citationCount":"0","resultStr":"{\"title\":\"Data-Driven Algorithms for Classification of In- and Outpatients in the Danish National Patient Register.\",\"authors\":\"Ann-Sophie Buchardt, Pi Vejsig Madsen, Andreas Jensen\",\"doi\":\"10.2147/CLEP.S500800\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Purpose: </strong>The Danish National Patient Register (DNPR) is an important data source for research providing detailed information on all hospital contacts in Denmark. With the transition from the second version of the DNPR (DNPR2) to the third version (DNPR3) in early 2019, the patient type variable (inpatient, elective outpatient, acute outpatient) was removed. This study proposes and evaluates algorithms to classify hospital contacts into these categories in DNPR3, aiming for consensus in data interpretation for researchers using Danish registries.</p><p><strong>Patients and methods: </strong>We analyzed somatic public hospital contacts in Denmark from 2017 to 2020, with 20,882,018 unique contacts in DNPR2 and 27,694,584 in DNPR3. Several classification algorithms were developed and assessed, including department-based, contact-based, and hybrid methods, to infer patient types in DNPR3 based on contact features, such as duration and admission type. In DNPR3, where the true patient type is unknown, proxy labels were used to train classification algorithms.</p><p><strong>Results: </strong>Compared to the true patient type variable in DNPR2, our department-based classifier showed high positive predictive values (PPVs) and sensitivities in DNPR2 with PPVs ranging from 95.6 to 99.5 and sensitivities ranging from 94.1 to 99.6 across patient types. The hybrid approach showed improved PPVs and sensitivities for acute (PPV = 97.3, sensitivity = 96.8) and elective (PPV = 99.8, sensitivity = 99.9) outpatients. In both DNPR2 and DNPR3 high agreement between contact-based classification algorithms was obtained indicating robustness in our classification methods which suggests the presence of inherent patterns in the data.</p><p><strong>Conclusion: </strong>Our study shows that all presented classification methods are suitable for categorizing patient types in DNPR2 depending on the available data and furthermore demonstrated robustness, supporting their suitability for classification in DNPR3. Future research should explore advanced techniques and comprehensive department classification for enhanced accuracy and applicability.</p>\",\"PeriodicalId\":10362,\"journal\":{\"name\":\"Clinical Epidemiology\",\"volume\":\"17 \",\"pages\":\"147-163\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2025-02-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11853825/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Clinical Epidemiology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.2147/CLEP.S500800\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q1\",\"JCRName\":\"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical Epidemiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2147/CLEP.S500800","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}
Data-Driven Algorithms for Classification of In- and Outpatients in the Danish National Patient Register.
Purpose: The Danish National Patient Register (DNPR) is an important data source for research providing detailed information on all hospital contacts in Denmark. With the transition from the second version of the DNPR (DNPR2) to the third version (DNPR3) in early 2019, the patient type variable (inpatient, elective outpatient, acute outpatient) was removed. This study proposes and evaluates algorithms to classify hospital contacts into these categories in DNPR3, aiming for consensus in data interpretation for researchers using Danish registries.
Patients and methods: We analyzed somatic public hospital contacts in Denmark from 2017 to 2020, with 20,882,018 unique contacts in DNPR2 and 27,694,584 in DNPR3. Several classification algorithms were developed and assessed, including department-based, contact-based, and hybrid methods, to infer patient types in DNPR3 based on contact features, such as duration and admission type. In DNPR3, where the true patient type is unknown, proxy labels were used to train classification algorithms.
Results: Compared to the true patient type variable in DNPR2, our department-based classifier showed high positive predictive values (PPVs) and sensitivities in DNPR2 with PPVs ranging from 95.6 to 99.5 and sensitivities ranging from 94.1 to 99.6 across patient types. The hybrid approach showed improved PPVs and sensitivities for acute (PPV = 97.3, sensitivity = 96.8) and elective (PPV = 99.8, sensitivity = 99.9) outpatients. In both DNPR2 and DNPR3 high agreement between contact-based classification algorithms was obtained indicating robustness in our classification methods which suggests the presence of inherent patterns in the data.
Conclusion: Our study shows that all presented classification methods are suitable for categorizing patient types in DNPR2 depending on the available data and furthermore demonstrated robustness, supporting their suitability for classification in DNPR3. Future research should explore advanced techniques and comprehensive department classification for enhanced accuracy and applicability.
期刊介绍:
Clinical Epidemiology is an international, peer reviewed, open access journal. Clinical Epidemiology focuses on the application of epidemiological principles and questions relating to patients and clinical care in terms of prevention, diagnosis, prognosis, and treatment.
Clinical Epidemiology welcomes papers covering these topics in form of original research and systematic reviews.
Clinical Epidemiology has a special interest in international electronic medical patient records and other routine health care data, especially as applied to safety of medical interventions, clinical utility of diagnostic procedures, understanding short- and long-term clinical course of diseases, clinical epidemiological and biostatistical methods, and systematic reviews.
When considering submission of a paper utilizing publicly-available data, authors should ensure that such studies add significantly to the body of knowledge and that they use appropriate validated methods for identifying health outcomes.
The journal has launched special series describing existing data sources for clinical epidemiology, international health care systems and validation studies of algorithms based on databases and registries.