Siona Prasad,Patricia C Dykes,Richard Schreiber,Shadi Hijjawi,Khalid Nawab,Alice Kim,Stuart Lipsitz,Ania Syrowatka,Lipika Samal,David W Bates,Veysel Karani Baris,Tien Thai,Michael Sainlaire,Frank Y Chang,John Novoa-Laurentiev,Gregory Piazza,Wenyu Song
{"title":"使用结构化和非结构化电子健康记录数据估计初级保健中静脉血栓栓塞可能性的算法的开发。","authors":"Siona Prasad,Patricia C Dykes,Richard Schreiber,Shadi Hijjawi,Khalid Nawab,Alice Kim,Stuart Lipsitz,Ania Syrowatka,Lipika Samal,David W Bates,Veysel Karani Baris,Tien Thai,Michael Sainlaire,Frank Y Chang,John Novoa-Laurentiev,Gregory Piazza,Wenyu Song","doi":"10.1002/ajh.70096","DOIUrl":null,"url":null,"abstract":"Venous thromboembolism (VTE) is a major public health concern. It is often clinically difficult to diagnose and affects up to 900 000 individuals annually in the United States. Delayed or missed VTE diagnosis can impact treatment and increase morbidity and mortality. This retrospective study utilized structured and unstructured electronic health record (EHR) data from a large integrated care network in the northeastern US, focusing on 4678 adult patients presenting with at least one VTE-associated sign or symptom at primary care visits during 2019-2020. Feature selection incorporated expert-guided and data-driven approaches, resulting in a final set of demographic, clinical history, and sign/symptom risk factors. The primary analysis developed seven machine learning models to predict VTE incidence. Secondary analyses included the prediction of timely and delayed VTE diagnoses. All models showed predictive ability with area under the curve (AUC) of 0.83-0.88. The logistic regression model demonstrated robust performance in predicting incident VTE cases, achieving an AUC of 0.88 (95% CI: 0.86-0.90). Multiple risk factors were identified, including cancer history, smoking history, and spinal cord trauma. Variations in the top risk factors between timely and delayed prediction models highlighted how certain patients were more likely to have a delayed or missed diagnosis. This study highlights the potential for data-driven tools to facilitate timely, point-of-care VTE detection by leveraging structured and unstructured EHR data. The prediction model accurately estimated the likelihood of incident VTEs, especially in cases diagnosed late, showing potential to reduce costly diagnostic delays.","PeriodicalId":7724,"journal":{"name":"American Journal of Hematology","volume":"41 1","pages":""},"PeriodicalIF":9.9000,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Development of an Algorithm for Estimating the Likelihood of Venous Thromboembolism in Primary Care Using Structured and Unstructured Electronic Health Record Data.\",\"authors\":\"Siona Prasad,Patricia C Dykes,Richard Schreiber,Shadi Hijjawi,Khalid Nawab,Alice Kim,Stuart Lipsitz,Ania Syrowatka,Lipika Samal,David W Bates,Veysel Karani Baris,Tien Thai,Michael Sainlaire,Frank Y Chang,John Novoa-Laurentiev,Gregory Piazza,Wenyu Song\",\"doi\":\"10.1002/ajh.70096\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Venous thromboembolism (VTE) is a major public health concern. It is often clinically difficult to diagnose and affects up to 900 000 individuals annually in the United States. Delayed or missed VTE diagnosis can impact treatment and increase morbidity and mortality. This retrospective study utilized structured and unstructured electronic health record (EHR) data from a large integrated care network in the northeastern US, focusing on 4678 adult patients presenting with at least one VTE-associated sign or symptom at primary care visits during 2019-2020. Feature selection incorporated expert-guided and data-driven approaches, resulting in a final set of demographic, clinical history, and sign/symptom risk factors. The primary analysis developed seven machine learning models to predict VTE incidence. Secondary analyses included the prediction of timely and delayed VTE diagnoses. All models showed predictive ability with area under the curve (AUC) of 0.83-0.88. The logistic regression model demonstrated robust performance in predicting incident VTE cases, achieving an AUC of 0.88 (95% CI: 0.86-0.90). Multiple risk factors were identified, including cancer history, smoking history, and spinal cord trauma. Variations in the top risk factors between timely and delayed prediction models highlighted how certain patients were more likely to have a delayed or missed diagnosis. This study highlights the potential for data-driven tools to facilitate timely, point-of-care VTE detection by leveraging structured and unstructured EHR data. The prediction model accurately estimated the likelihood of incident VTEs, especially in cases diagnosed late, showing potential to reduce costly diagnostic delays.\",\"PeriodicalId\":7724,\"journal\":{\"name\":\"American Journal of Hematology\",\"volume\":\"41 1\",\"pages\":\"\"},\"PeriodicalIF\":9.9000,\"publicationDate\":\"2025-10-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"American Journal of Hematology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1002/ajh.70096\",\"RegionNum\":1,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"HEMATOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"American Journal of Hematology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/ajh.70096","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEMATOLOGY","Score":null,"Total":0}
Development of an Algorithm for Estimating the Likelihood of Venous Thromboembolism in Primary Care Using Structured and Unstructured Electronic Health Record Data.
Venous thromboembolism (VTE) is a major public health concern. It is often clinically difficult to diagnose and affects up to 900 000 individuals annually in the United States. Delayed or missed VTE diagnosis can impact treatment and increase morbidity and mortality. This retrospective study utilized structured and unstructured electronic health record (EHR) data from a large integrated care network in the northeastern US, focusing on 4678 adult patients presenting with at least one VTE-associated sign or symptom at primary care visits during 2019-2020. Feature selection incorporated expert-guided and data-driven approaches, resulting in a final set of demographic, clinical history, and sign/symptom risk factors. The primary analysis developed seven machine learning models to predict VTE incidence. Secondary analyses included the prediction of timely and delayed VTE diagnoses. All models showed predictive ability with area under the curve (AUC) of 0.83-0.88. The logistic regression model demonstrated robust performance in predicting incident VTE cases, achieving an AUC of 0.88 (95% CI: 0.86-0.90). Multiple risk factors were identified, including cancer history, smoking history, and spinal cord trauma. Variations in the top risk factors between timely and delayed prediction models highlighted how certain patients were more likely to have a delayed or missed diagnosis. This study highlights the potential for data-driven tools to facilitate timely, point-of-care VTE detection by leveraging structured and unstructured EHR data. The prediction model accurately estimated the likelihood of incident VTEs, especially in cases diagnosed late, showing potential to reduce costly diagnostic delays.
期刊介绍:
The American Journal of Hematology offers extensive coverage of experimental and clinical aspects of blood diseases in humans and animal models. The journal publishes original contributions in both non-malignant and malignant hematological diseases, encompassing clinical and basic studies in areas such as hemostasis, thrombosis, immunology, blood banking, and stem cell biology. Clinical translational reports highlighting innovative therapeutic approaches for the diagnosis and treatment of hematological diseases are actively encouraged.The American Journal of Hematology features regular original laboratory and clinical research articles, brief research reports, critical reviews, images in hematology, as well as letters and correspondence.