Akshay Swaminathan, Ivan Lopez, William Wang, Ujwal Srivastava, Edward Tran, A. Bhargava-Shah, Janet Y Wu, Alexander L Ren, Kaitlin Caoili, Brandon Bui, L. Alkhani, Susan Lee, Nathan Mohit, Noel Seo, N. Macedo, Winson Cheng, Charles Liu, Reena Thomas, Jonathan H Chen, O. Gevaert
{"title":"Selective prediction for extracting unstructured clinical data","authors":"Akshay Swaminathan, Ivan Lopez, William Wang, Ujwal Srivastava, Edward Tran, A. Bhargava-Shah, Janet Y Wu, Alexander L Ren, Kaitlin Caoili, Brandon Bui, L. Alkhani, Susan Lee, Nathan Mohit, Noel Seo, N. Macedo, Winson Cheng, Charles Liu, Reena Thomas, Jonathan H Chen, O. Gevaert","doi":"10.1101/2022.11.15.22282368","DOIUrl":"https://doi.org/10.1101/2022.11.15.22282368","url":null,"abstract":"Background: Electronic health records represent a large data source for outcomes research, but the majority of EHR data is unstructured (e.g. free text of clinical notes) and not conducive to computational methods. While there are currently approaches to handle unstructured data, such as manual abstraction, structured proxy variables, and model-assisted abstraction, these methods are time-consuming, not scalable, and require clinical domain expertise. This paper aims to determine whether selective prediction, which gives a model the option to abstain from generating a prediction, can improve the accuracy and efficiency of unstructured clinical data abstraction. Methods: We trained selective prediction models to identify the presence of four distinct clinical variables in free-text pathology reports: primary cancer diagnosis of glioblastoma (GBM, n = 659), resection of rectal adenocarcinoma (RRA, n = 601), and two procedures for resection of rectal adenocarcinoma: abdominoperineal resection (APR, n = 601) and low anterior resection (LAR, n = 601). Data were manually abstracted from pathology reports and used to train L1-regularized logistic regression models using term-frequency-inverse-document-frequency features. Data points that the model was unable to predict with high certainty were manually abstracted. Findings: All four selective prediction models achieved a test-set sensitivity, specificity, positive predictive value, and negative predictive value above 0.91. The use of selective prediction led to sizable gains in automation (anywhere from 57% to 95% reduction in manual abstraction of charts across the four outcomes). For our GBM classifier, the selective prediction model saw improvements to sensitivity (0.94 to 0.96), specificity (0.79 to 0.96), PPV (0.89 to 0.98), and NPV (0.88 to 0.91) when compared to a non-selective classifier. Interpretation: Selective prediction using utility-based probability thresholds can facilitate unstructured data extraction by giving \"easy\" charts to a model and \"hard\" charts to human abstractors, thus increasing efficiency while maintaining or improving accuracy.","PeriodicalId":236137,"journal":{"name":"Journal of the American Medical Informatics Association : JAMIA","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131867440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Leese, A. Anand, A. Girvin, A. Manna, S. Patel, Y. J. Yoo, R. Wong, M. Haendel, C. Chute, T. Bennett, J. Hajagos, E. Pfaff, R. Moffitt
{"title":"Clinical encounter heterogeneity and methods for resolving in networked EHR data: A study from N3C and RECOVER programs","authors":"P. Leese, A. Anand, A. Girvin, A. Manna, S. Patel, Y. J. Yoo, R. Wong, M. Haendel, C. Chute, T. Bennett, J. Hajagos, E. Pfaff, R. Moffitt","doi":"10.1101/2022.10.14.22281106","DOIUrl":"https://doi.org/10.1101/2022.10.14.22281106","url":null,"abstract":"OBJECTIVE: Clinical encounter data are heterogeneous and vary greatly from institution to institution. These problems of variance affect interpretability and usability of clinical encounter data for analysis. These problems are magnified when multi-site electronic health record data are networked together. This paper presents a novel, generalizable method for resolving encounter heterogeneity for analysis by combining related atomic encounters into composite macrovisits. MATERIALS AND METHODS: Encounters were composed of data from 75 partner sites harmonized to a common data model as part of the NIH Researching COVID to Enhance Recovery Initiative, a project of the National Covid Cohort Collaborative. Summary statistics were computed for overall and site-level data to assess issues and identify modifications. Two algorithms were developed to refine atomic encounters into cleaner, analyzable longitudinal clinical visits. RESULTS: Atomic inpatient encounters data were found to be widely disparate between sites in terms of length-of-stay and numbers of OMOP CDM measurements per encounter. After aggregating encounters to macrovisits, variance of length-of-stay (LOS) and measurement frequency decreased. A subsequent algorithm to identify hospitalized macrovisits further reduced data variability. DISCUSSION: Encounters data are a complex and heterogeneous component of EHR data and these issues are not addressed by existing methods. These types of complex and poorly studied issues contribute to the difficulty of deriving value from EHR data, and these types of foundational, large-scale explorations and developments are necessary to realize the full potential of modern real world data. CONCLUSION: This paper presents method developments to work with and resolve EHR encounters data in a generalizable way as a foundation for future analyses and research.","PeriodicalId":236137,"journal":{"name":"Journal of the American Medical Informatics Association : JAMIA","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127470005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Y. Liu, J. Herrin, C. Huang, R. Khera, L. Dhingra, W. Dong, B. Mortazavi, H. Krumholz, Y. Lu
{"title":"Non-exercise Machine Learning Models for Maximal Oxygen Uptake Prediction in National Population Surveys","authors":"Y. Liu, J. Herrin, C. Huang, R. Khera, L. Dhingra, W. Dong, B. Mortazavi, H. Krumholz, Y. Lu","doi":"10.1101/2022.09.30.22280471","DOIUrl":"https://doi.org/10.1101/2022.09.30.22280471","url":null,"abstract":"ABSTRACT Background: Maximal oxygen uptake (VO2 max), an indicator of cardiorespiratory fitness (CRF), requires exercise testing and, as a result, is rarely ascertained in large-scale population-based studies. Non-exercise algorithms are cost-effective methods to estimate VO2 max, but the existing models have limitations in generalizability and predictive power. This study aims to improve the non-exercise algorithms using machine learning (ML) methods and data from U.S. national population surveys. Methods: We used the 1999-2004 data from the National Health and Nutrition Examination Survey (NHANES), in which a submaximal exercise test produced an estimate of the VO2max. We applied multiple supervised ML algorithms to build two models: a parsimonious model that used variables readily available in clinical practice, and an extended model that additionally included more complex variables from more Dual-Energy X-ray Absorptiometry (DEXA) and standard laboratory tests. We used Shapley additive explanation (SHAP) to interpret the new model and identify the key predictors. For comparison, existing non-exercise algorithms were applied unmodified to the testing set. Results: Among the 5,668 NHANES participants included in the final study population, the mean age was 32.5 years and 49.9% were women. Light Gradient Boosting Machine (LightGBM) had the best performance across multiple types of supervised ML algorithms. Compared with the best existing non-exercise algorithms that could be applied in NHANES, the parsimonious LightGBM model (RMSE: 8.51 ml/kg/min [95% CI: 7.73 -9.33]) and the extended model (RMSE: 8.26 ml/kg/min [95% CI: 7.44 -9.09]) significantly reducing the error by 15% (P <0.01) and 12% (P<0.01 for both), respectively. Conclusion: Our non-exercise ML model provides a more accurate prediction of VO2 max for NHANES participants than existing non-exercise algorithms. Keywords: Machine learning, GBDTs, Cardiorespiratory fitness, VO2max, NHANES","PeriodicalId":236137,"journal":{"name":"Journal of the American Medical Informatics Association : JAMIA","volume":"148 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116502434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Correction: A bias evaluation checklist for predictive models and its pilot application for 30-day hospital readmission models","authors":"","doi":"10.1093/jamia/ocac102","DOIUrl":"https://doi.org/10.1093/jamia/ocac102","url":null,"abstract":"","PeriodicalId":236137,"journal":{"name":"Journal of the American Medical Informatics Association : JAMIA","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117085519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. X. Xie, Qiuzhe Chen, C. Hincapié, Léonie Hofstetter, C. Maher, G. Machado
{"title":"Effectiveness of clinical dashboards as audit and feedback or clinical decision support tools on medication use and test ordering: a systematic review of randomized controlled trials","authors":"C. X. Xie, Qiuzhe Chen, C. Hincapié, Léonie Hofstetter, C. Maher, G. Machado","doi":"10.1093/jamia/ocac094","DOIUrl":"https://doi.org/10.1093/jamia/ocac094","url":null,"abstract":"Abstract Background Clinical dashboards used as audit and feedback (A&F) or clinical decision support systems (CDSS) are increasingly adopted in healthcare. However, their effectiveness in changing the behavior of clinicians or patients is still unclear. This systematic review aims to investigate the effectiveness of clinical dashboards used as CDSS or A&F tools (as a standalone intervention or part of a multifaceted intervention) in primary care or hospital settings on medication prescription/adherence and test ordering. Methods Seven major databases were searched for relevant studies, from inception to August 2021. Two authors independently extracted data, assessed the risk of bias using the Cochrane RoB II scale, and evaluated the certainty of evidence using GRADE. Data on trial characteristics and intervention effect sizes were extracted. A narrative synthesis was performed to summarize the findings of the included trials. Results Eleven randomized trials were included. Eight trials evaluated clinical dashboards as standalone interventions and provided conflicting evidence on changes in antibiotic prescribing and no effects on statin prescribing compared to usual care. Dashboards increased medication adherence in patients with inflammatory arthritis but not in kidney transplant recipients. Three trials investigated dashboards as part of multicomponent interventions revealing decreased use of opioids for low back pain, increased proportion of patients receiving cardiovascular risk screening, and reduced antibiotic prescribing for upper respiratory tract infections. Conclusion There is limited evidence that dashboards integrated into electronic medical record systems and used as feedback or decision support tools may be associated with improvements in medication use and test ordering.","PeriodicalId":236137,"journal":{"name":"Journal of the American Medical Informatics Association : JAMIA","volume":"78 9","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113933244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gabriel A Carrillo, M. Cohen-Wolkowiez, E. D'agostino, K. Marsolo, Lisa M. Wruck, Laura Johnson, James Topping, Al Richmond, Giselle Corbie, W. Kibbe
{"title":"Standardizing, harmonizing, and protecting data collection to broaden the impact of COVID-19 research: the rapid acceleration of diagnostics-underserved populations (RADx-UP) initiative","authors":"Gabriel A Carrillo, M. Cohen-Wolkowiez, E. D'agostino, K. Marsolo, Lisa M. Wruck, Laura Johnson, James Topping, Al Richmond, Giselle Corbie, W. Kibbe","doi":"10.1093/jamia/ocac097","DOIUrl":"https://doi.org/10.1093/jamia/ocac097","url":null,"abstract":"Abstract Objective The Rapid Acceleration of Diagnostics-Underserved Populations (RADx-UP) program is a consortium of community-engaged research projects with the goal of increasing access to Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) tests in underserved populations. To accelerate clinical research, common data elements (CDEs) were selected and refined to standardize data collection and enhance cross-consortium analysis. Materials and Methods The RADx-UP consortium began with more than 700 CDEs from the National Institutes of Health (NIH) CDE Repository, Disaster Research Response (DR2) guidelines, and the PHENotypes and eXposures (PhenX) Toolkit. Following a review of initial CDEs, we made selections and further refinements through an iterative process that included live forums, consultations, and surveys completed by the first 69 RADx-UP projects. Results Following a multistep CDE development process, we decreased the number of CDEs, modified the question types, and changed the CDE wording. Most research projects were willing to collect and share demographic NIH Tier 1 CDEs, with the top exception reason being a lack of CDE applicability to the project. The NIH RADx-UP Tier 1 CDE with the lowest frequency of collection and sharing was sexual orientation. Discussion We engaged a wide range of projects and solicited bidirectional input to create CDEs. These RADx-UP CDEs could serve as the foundation for a patient-centered informatics architecture allowing the integration of disease-specific databases to support hypothesis-driven clinical research in underserved populations. Conclusion A community-engaged approach using bidirectional feedback can lead to the better development and implementation of CDEs in underserved populations during public health emergencies.","PeriodicalId":236137,"journal":{"name":"Journal of the American Medical Informatics Association : JAMIA","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123982238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Diba Khan, Meeyoung Park, Samuel Lerma, Stephen Soroka, D. Gaughan, L. Bottichio, Monika Bray, Mary Fukushima, B. Bregman, Caleb Wiedeman, William Duck, Deborah Dee, A. Gundlapalli, A. Suthar
{"title":"Improving efficiency of COVID-19 aggregate case and death surveillance data transmission for jurisdictions: current and future role of application programming interfaces (APIs)","authors":"Diba Khan, Meeyoung Park, Samuel Lerma, Stephen Soroka, D. Gaughan, L. Bottichio, Monika Bray, Mary Fukushima, B. Bregman, Caleb Wiedeman, William Duck, Deborah Dee, A. Gundlapalli, A. Suthar","doi":"10.1093/jamia/ocac090","DOIUrl":"https://doi.org/10.1093/jamia/ocac090","url":null,"abstract":"Abstract During the coronavirus disease-2019 (COVID-19) pandemic, the Centers for Disease Control and Prevention (CDC) supplemented traditional COVID-19 case and death reporting with COVID-19 aggregate case and death surveillance (ACS) to track daily cumulative numbers. Later, as public health jurisdictions (PHJs) revised the historical COVID-19 case and death data due to data reconciliation and updates, CDC devised a manual process to update these records in the ACS dataset for improving the accuracy of COVID-19 case and death data. Automatic data transfer via an application programming interface (API), an intermediary that enables software applications to communicate, reduces the time and effort in transferring data from PHJs to CDC. However, APIs must meet specific content requirements for use by CDC. As of March 2022, CDC has integrated APIs from 3 jurisdictions for COVID-19 ACS. Expanded use of APIs may provide efficiencies for COVID-19 and other emergency response planning efforts as evidenced by this proof-of-concept. In this article, we share the utility of APIs in COVID-19 ACS.","PeriodicalId":236137,"journal":{"name":"Journal of the American Medical Informatics Association : JAMIA","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132409272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Raja A. Cholan, Gregory Pappas, Greg Rehwoldt, A. Sills, Elizabeth D. Korte, I. K. Appleton, Natalie M Scott, W. Rubinstein, Sara A. Brenner, Riki Merrick, W. Hadden, K. E. Campbell, Michael S. Waters
{"title":"Encoding laboratory testing data: case studies of the national implementation of HHS requirements and related standards in five laboratories","authors":"Raja A. Cholan, Gregory Pappas, Greg Rehwoldt, A. Sills, Elizabeth D. Korte, I. K. Appleton, Natalie M Scott, W. Rubinstein, Sara A. Brenner, Riki Merrick, W. Hadden, K. E. Campbell, Michael S. Waters","doi":"10.1093/jamia/ocac072","DOIUrl":"https://doi.org/10.1093/jamia/ocac072","url":null,"abstract":"Abstract Objective Assess the effectiveness of providing Logical Observation Identifiers Names and Codes (LOINC®)-to-In Vitro Diagnostic (LIVD) coding specification, required by the United States Department of Health and Human Services for SARS-CoV-2 reporting, in medical center laboratories and utilize findings to inform future United States Food and Drug Administration policy on the use of real-world evidence in regulatory decisions. Materials and Methods We compared gaps and similarities between diagnostic test manufacturers’ recommended LOINC® codes and the LOINC® codes used in medical center laboratories for the same tests. Results Five medical centers and three test manufacturers extracted data from laboratory information systems (LIS) for prioritized tests of interest. The data submission ranged from 74 to 532 LOINC® codes per site. Three test manufacturers submitted 15 LIVD catalogs representing 26 distinct devices, 6956 tests, and 686 LOINC® codes. We identified mismatches in how medical centers use LOINC® to encode laboratory tests compared to how test manufacturers encode the same laboratory tests. Of 331 tests available in the LIVD files, 136 (41%) were represented by a mismatched LOINC® code by the medical centers (chi-square 45.0, 4 df, P < .0001). Discussion The five medical centers and three test manufacturers vary in how they organize, categorize, and store LIS catalog information. This variation impacts data quality and interoperability. Conclusion The results of the study indicate that providing the LIVD mappings was not sufficient to support laboratory data interoperability. National implementation of LIVD and further efforts to promote laboratory interoperability will require a more comprehensive effort and continuing evaluation and quality control.","PeriodicalId":236137,"journal":{"name":"Journal of the American Medical Informatics Association : JAMIA","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121867180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Maurits, I. Korsunsky, S. Raychaudhuri, S. Murphy, J. Smoller, Scott T. Weiss, L. Petukhova, C. Weng, Wei-Qi Wei, T. Huizinga, M. Reinders, E. Karlson
{"title":"Correction to: A framework for employing longitudinally collected multicenter electronic health records to stratify heterogeneous patient populations on disease history","authors":"M. Maurits, I. Korsunsky, S. Raychaudhuri, S. Murphy, J. Smoller, Scott T. Weiss, L. Petukhova, C. Weng, Wei-Qi Wei, T. Huizinga, M. Reinders, E. Karlson","doi":"10.1093/jamia/ocac080","DOIUrl":"https://doi.org/10.1093/jamia/ocac080","url":null,"abstract":"This is a correction to: Marc P Maurits, Ilya Korsunsky, Soumya Raychaudhuri, Shawn N Murphy, Jordan W Smoller, Scott T Weiss, Lynn M. Petukhova, Chunhua Weng, Wei-Qi Wei, Thomas W J Huizinga, Marcel J T Reinders, Elizabeth W Karlson, Erik B van den Akker, Rachel Knevel, eMERGE Consortium, A framework for employing longitudinally collected multicenter electronic health records to stratify heterogeneous patient populations on disease history, Journal of the American Medical Informatics Association, Volume 29, Issue 5, May 2022, Pages 761–769, https://doi.org/10. 1093/jamia/ocac008","PeriodicalId":236137,"journal":{"name":"Journal of the American Medical Informatics Association : JAMIA","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131616183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Correction to: Research Data Warehouse Best Practices: Catalyzing National Data Sharing through Informatics Innovation","authors":"","doi":"10.1093/jamia/ocac075","DOIUrl":"https://doi.org/10.1093/jamia/ocac075","url":null,"abstract":"","PeriodicalId":236137,"journal":{"name":"Journal of the American Medical Informatics Association : JAMIA","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114081448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}