Ada H Tsoi, Gary Gartner, Steven W Cotten, John Kim, John Nazarian, Joseph Thomas, Steven David McSwain, Rachini Ahmadi-Moosavi, Ram Rimal
{"title":"Establishing and implementing a responsible artificial intelligence framework: a 1-year review.","authors":"Ada H Tsoi, Gary Gartner, Steven W Cotten, John Kim, John Nazarian, Joseph Thomas, Steven David McSwain, Rachini Ahmadi-Moosavi, Ram Rimal","doi":"10.1093/jamia/ocaf147","DOIUrl":"https://doi.org/10.1093/jamia/ocaf147","url":null,"abstract":"<p><strong>Objective: </strong>This work highlights successes and challenges of implementing a novel responsible artificial intelligence (RAI) framework, emphasizing healthcare disciplines needed to operationalize it.</p><p><strong>Materials and methods: </strong>UNC Health developed an RAI framework to assess artificial intelligence (AI) solutions, featuring a 21-question intake survey aligned with institutional goals to promote fairness, transparency, accountability, and trustworthiness, and evaluated by clinical, analytical, and operational experts.</p><p><strong>Results: </strong>Twelve survey evaluations revealed low fairness scores and resulted in 83% conditional approvals.</p><p><strong>Discussion: </strong>Learnings included the importance of representative training datasets, systematic evaluation of vendor-provided models, and robust post-implementation monitoring. Challenges included the infrequency of analyses stratified by demographics, limited vendor transparency, and reliance on volunteer engagement for survey evaluations.</p><p><strong>Conclusions: </strong>Our framework provides a roadmap to assess AI tools in healthcare but requires overcoming implementation barriers like resource constraints and vendor cooperation. Future iterations should consider tiered evaluations based on risk likelihood and member engagement for scalability.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2025-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145092738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Levi Kaster, Ethan Hillis, Inez Y Oh, Elizabeth C Cordell, Randi E Foraker, Albert M Lai, Stephanie M Morris, David H Gutmann, Philip R O Payne, Aditi Gupta
{"title":"Comparison of rule- and large language model-based phenotype extraction from clinical notes for neurofibromatosis type 1.","authors":"Levi Kaster, Ethan Hillis, Inez Y Oh, Elizabeth C Cordell, Randi E Foraker, Albert M Lai, Stephanie M Morris, David H Gutmann, Philip R O Payne, Aditi Gupta","doi":"10.1093/jamia/ocaf155","DOIUrl":"https://doi.org/10.1093/jamia/ocaf155","url":null,"abstract":"<p><strong>Introduction: </strong>Neurofibromatosis type 1 (NF1) is a rare genetic disorder affecting multiple organ systems with significant clinical heterogeneity. Managing individuals with NF1 is challenging due to variability in disease progression and outcomes and limited early risk assessment tools.</p><p><strong>Objective: </strong>This study aims to develop an effective, generalizable, user-friendly clinical entity extraction pipeline for identifying NF1-related phenotypes from unstructured clinical notes to enhance research and risk-modeling efforts. We compare the benefits of rule-based natural language processing (NLP) vs large language models (LLMs) for this purpose.</p><p><strong>Materials and methods: </strong>Four phenotype extraction pipelines (3 LLM-based vs 1 rule-based) were developed to automatically extract selected NF1-relevant phenotypes. Subject matter experts manually reviewed clinical notes, generating a gold-standard annotation dataset for evaluation. In Phase 1, notes authored by a single NF1 physician were used to guide pipeline development and refinement. In Phase 2, notes from a second NF1 physician were used to assess pipeline generalizability, followed by further refinement to accommodate differences in physician terminology.</p><p><strong>Results: </strong>With refinement, the rule-based model had higher distributions of F1 scores than the LLMs in both Phase 1 and Phase 2. However, the LLMs demonstrated better generalizability between physicians without refinement, showing lesser performance decreases (4.4%-5.1%) when transitioning from Phase 1 to Phase 2 without refinement, compared to an 8.8% decrease for the rule-based model.</p><p><strong>Conclusion: </strong>We highlight trade-offs between the effectiveness of rule-based NLP vs generalizability and ease of implementation of LLMs for clinical entity extraction, with implications for pipeline portability across providers and institutions.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2025-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145088031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ibrahim Serhat Karakus, Inna Strechen, Ankita Gupta, Keivan Nalaie, Christine L Chen, Leslie C Hassett, Amelia K Barwise
{"title":"Bridging language gaps in healthcare: a systematic review of the practical implementation of neural machine translation technologies in clinical settings.","authors":"Ibrahim Serhat Karakus, Inna Strechen, Ankita Gupta, Keivan Nalaie, Christine L Chen, Leslie C Hassett, Amelia K Barwise","doi":"10.1093/jamia/ocaf150","DOIUrl":"https://doi.org/10.1093/jamia/ocaf150","url":null,"abstract":"<p><strong>Objectives: </strong>Effective communication is crucial in healthcare, and for patients with a non-English language preference (NELP), professional interpreters are recognized as the gold standard in supporting bidirectional communication. However, interpreters are not always readily available, prompting the exploration of other options for translation and interpretation. The recent developments in artificial intelligence-based neural network translation tools, namely neural machine translation (NMT) may enable robust interpretation and translation.</p><p><strong>Materials and methods: </strong>We conducted a systematic review (SR) to evaluate the literature on NMT for this purpose. We did a comprehensive search of several databases with guidance from a professional librarian. The search was limited to the year 2000 onwards and English language. Title and abstract screening and full-text review were independently conducted by two reviewers with conflicts resolved by a third reviewer.</p><p><strong>Results: </strong>2867 studies were identified with 10 studies included in the final analysis. Among these, six evaluated interpretation in real or simulated clinical settings and four examined translation of discharge materials. Google Translate and ChatGPT were assessed in several studies. Accuracy differed by language, with low-resource languages performing worse.</p><p><strong>Discussion: </strong>NMT technologies in healthcare have several advantages including broad language accessibility and potential cost savings for institutions. Despite improved accuracy of these novel tools, due to possible critical errors NMT tools are not yet ready for widespread clinical use.</p><p><strong>Conclusion: </strong>Future studies should focus on optimizing evaluation methods as well as how best to integrate these technologies into real-time clinical settings.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2025-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145088072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiayi Tong, Yifei Sun, Rebecca A Hubbard, M Elle Saine, Hua Xu, Xu Zuo, Lifeng Lin, Chunhua Weng, Christopher H Schmid, Stephen E Kimmel, Craig A Umscheid, Adam Cuker, Yong Chen
{"title":"Incorporating preprints in systematic reviews: a preliminary study of a novel method for rapid evidence synthesis.","authors":"Jiayi Tong, Yifei Sun, Rebecca A Hubbard, M Elle Saine, Hua Xu, Xu Zuo, Lifeng Lin, Chunhua Weng, Christopher H Schmid, Stephen E Kimmel, Craig A Umscheid, Adam Cuker, Yong Chen","doi":"10.1093/jamia/ocaf111","DOIUrl":"10.1093/jamia/ocaf111","url":null,"abstract":"<p><strong>Objectives: </strong>By October 1, 2024, over 450,000 COVID-19 manuscripts were published, with 10% posted as unreviewed preprints. While they accelerate knowledge sharing, their inconsistent quality complicates systematic studies.</p><p><strong>Materials and methods: </strong>We propose a 2-stage method to include preprints in meta-analyses. In Stage A, preprints are integrated through restriction or imputation and weighted by a confidence score reflecting their publication likelihood. In Stage B, we assess and adjust for potential publication or reporting biases.</p><p><strong>Results: </strong>This preliminary study employed a 2-stage procedure validated with 2 COVID-19 treatment case studies. For hydroxychloroquine, the relative risk (RR) was 1.06 [95% CI: 0.62, 1.80], suggesting no mortality benefit over placebo. For corticosteroids, the RR was 0.88 [95% CI: 0.62, 1.27], which, while not statistically significant, aligns with evidence supporting a mortality benefit.</p><p><strong>Discussion: </strong>Our research aims to bridge a significant methodological gap by providing a solution for timely evidence synthesis, particularly in the face of the overwhelming number of publications surrounding COVID-19.</p><p><strong>Conclusion: </strong>This preliminary study presents a method to efficiently synthesize COVID-19 research, including non-peer-reviewed preprints, to support clinical and policy decisions amidst the information surge.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144994236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wanxin Li, Saad Ahmed, Yongjin P Park, Khanh Dao Duc
{"title":"Transport-based transfer learning on Electronic Health Records: application to detection of treatment disparities.","authors":"Wanxin Li, Saad Ahmed, Yongjin P Park, Khanh Dao Duc","doi":"10.1093/jamia/ocaf134","DOIUrl":"https://doi.org/10.1093/jamia/ocaf134","url":null,"abstract":"<p><strong>Objectives: </strong>Electronic Health Records (EHRs) sampled from different populations can introduce unwanted biases, limit individual-level data sharing, and make the data and fitted model hardly transferable across different population groups. In this context, our main goal is to design an effective method to transfer knowledge between population groups, with computable guarantees for suitability, and that can be applied to quantify treatment disparities.</p><p><strong>Materials and methods: </strong>For a model trained in an embedded feature space of one subgroup, our proposed framework, Optimal Transport-based Transfer Learning for EHRs (OTTEHR), combines feature embedding of the data and unbalanced optimal transport (OT) for domain adaptation to another population group. To test our method, we processed and divided the MIMIC-III and MIMIC-IV databases into multiple population groups using ICD codes and multiple labels.</p><p><strong>Results: </strong>We derive a theoretical bound for the generalization error of our method, and interpret it in terms of the Wasserstein distance, unbalancedness between the source and target domains, and labeling divergence, which can be used as a guide for assessing the suitability of binary classification and regression tasks. In general, our method achieves better accuracy and computational efficiency compared with standard and machine learning transfer learning methods on various tasks. Upon testing our method for populations with different insurance plans, we detect various levels of disparities in hospital duration stay between groups.</p><p><strong>Discussion and conclusion: </strong>By leveraging tools from OT theory, our proposed framework allows to compare statistical models on EHR data between different population groups. As a potential application for clinical decision making, we quantify treatment disparities between different population groups. Future directions include applying OTTEHR to broader regression and classification tasks and extending the method to semi-supervised learning.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144994218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multimodal deep learning for immunotherapy response prediction and biomarker discovery in non-small cell lung cancer.","authors":"Zijun Wang, Xi Liu, Kaitai Han, Lixin Lei, Chaojing Shi, Wu Liu, Qianjin Guo","doi":"10.1093/jamia/ocaf142","DOIUrl":"https://doi.org/10.1093/jamia/ocaf142","url":null,"abstract":"<p><strong>Objective: </strong>Immunotherapy has emerged as a promising treatment for advanced non-small cell lung cancer (NSCLC), but accurately predicting which patients will benefit from it remains a major clinical challenge. To address this, we aim to develop a novel multimodal method, DeepAFM, that integrates histopathology, genomic features, and clinical information to predict patient responses to anti-PD-(L)1 immunotherapy.</p><p><strong>Materials and methods: </strong>A total of 93 patients with advanced NSCLC were included in this study. Histopathological whole-slide images were processed using a self-supervised VQVAE2 for representation learning. PCA and K-means clustering were then applied for dimensionality reduction and feature grouping. Key regions of interest were visualized through permutation importance evaluation and color-coding techniques. The extracted histopathological features, along with genomic alterations and clinical variables, were integrated into the DeepAFM multimodal prediction model.</p><p><strong>Results: </strong>The DeepAFM achieved a high predictive performance with an area under the curve (AUC) of 0.77 (95% confidence interval: 0.69-1.00). Attention-based heatmaps revealed that the model could identify critical pathological patterns, genomic mutations, and clinical indicators associated with patient responses to immunotherapy.</p><p><strong>Discussion: </strong>The integration of multimodal data enabled the model to capture complex interactions among pathology, genomics, and clinical characteristics, enhancing the interpretability and predictive power of immunotherapy response prediction. The visualization techniques facilitated the identification of biologically meaningful features and potential biomarkers.</p><p><strong>Conclusion: </strong>This study demonstrates the effectiveness of the DeepAFM in predicting responses to immunotherapy in advanced NSCLC. The approach not only improves prediction accuracy but also provides valuable insights for personalized treatment strategies and biomarker discovery.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144994242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Human-centered explainability evaluation in clinical decision-making: a critical review of the literature.","authors":"Jenny M Bauer, Martin Michalowski","doi":"10.1093/jamia/ocaf110","DOIUrl":"10.1093/jamia/ocaf110","url":null,"abstract":"<p><strong>Objectives: </strong>This review paper comprehensively summarizes healthcare provider (HCP) evaluation of explanations produced by explainable artificial intelligence methods to support point-of-care, patient-specific, clinical decision-making (CDM) within medical settings. It highlights the critical need to incorporate human-centered (HCP) evaluation approaches based on their CDM needs, processes, and goals.</p><p><strong>Materials and methods: </strong>The review was conducted in Ovid Medline and Scopus databases, following the Institute of Medicine's methodological standards and PRISMA guidelines. An individual study appraisal was conducted using design-specific appraisal tools. MaxQDA software was used for data extraction and evidence table procedures.</p><p><strong>Results: </strong>Of the 2673 unique records retrieved, 25 records were included in the final sample. Studies were excluded if they did not meet this review's definitions of HCP evaluation (1156), healthcare use (995), explainable AI (211), and primary research (285), and if they were not available in English (1). The sample focused primarily on physicians and diagnostic imaging use cases and revealed wide-ranging evaluation measures.</p><p><strong>Discussion: </strong>The synthesis of sampled studies suggests a potential common measure of clinical explainability with 3 indicators of interpretability, fidelity, and clinical value. There is an opportunity to extend the current model-centered evaluation approaches to incorporate human-centered metrics, supporting the transition into practice.</p><p><strong>Conclusion: </strong>Future research should aim to clarify and expand key concepts in HCP evaluation, propose a comprehensive evaluation model positioned in current theoretical knowledge, and develop a valid instrument to support comparisons.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"1477-1484"},"PeriodicalIF":4.6,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12361851/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144627601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rory Davidson, Will Hardman, Guy Amit, Yonatan Bilu, Vincenzo Della Mea, Aleksandr Galaida, Irena Girshovitz, Mikhail Kulyabin, Mihai Horia Popescu, Kevin Roitero, Gleb Sokolov, Chen Yanover
{"title":"SNOMED CT entity linking challenge.","authors":"Rory Davidson, Will Hardman, Guy Amit, Yonatan Bilu, Vincenzo Della Mea, Aleksandr Galaida, Irena Girshovitz, Mikhail Kulyabin, Mihai Horia Popescu, Kevin Roitero, Gleb Sokolov, Chen Yanover","doi":"10.1093/jamia/ocaf104","DOIUrl":"10.1093/jamia/ocaf104","url":null,"abstract":"<p><strong>Objective: </strong>This paper presents the results from a competition challenging participants to develop entity linking models using a subset of annotated MIMIC-IV-Note data and the SNOMED CT Terminology.</p><p><strong>Materials and methods: </strong>As a basis for this work, a large set of 74 808 annotations was curated across 272 discharge notes spanning 6624 unique clinical concepts. Submissions were evaluated using the mean Intersection-over-Union metric, evaluated at the character level with the 3 best performing solutions awarded a cash prize.</p><p><strong>Results: </strong>The winning solutions employed contrasting approaches: a dictionary-based method, an encoder-based method, and a decoder-based method.</p><p><strong>Discussion: </strong>Our analysis reveals that concept frequency in training data significantly impacts model performance, with rare concepts proving particularly challenging. High concept entropy and annotation ambiguity were also associated with decreased performance.</p><p><strong>Conclusion: </strong>Findings from this work suggest that future projects should focus on improving entity linking for rare concepts and developing methods to better leverage contextual information when training examples are scarce.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"1397-1406"},"PeriodicalIF":4.6,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12361850/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144627602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Identifying family structures from obituaries and matching them to patients in an electronic heath record.","authors":"John Mayer, Brooke Delgoffe, Scott Hebbring","doi":"10.1093/jamia/ocaf102","DOIUrl":"10.1093/jamia/ocaf102","url":null,"abstract":"<p><strong>Objectives: </strong>Family data are a valuable data source in bioinformatic research. This is because family members often share common genetic and environmental exposures. Collecting this family data is traditionally very labor intensive but advances in electronic health record (EHR) data mining has proven useful when identifying pedigrees linked to longitudinal health histories. These are called e-pedigrees. Unfortunately, e-pedigrees tend to miss the oldest patients who inherently have the longest and richest health histories. A good source of family data from older generations includes obituaries, as they have a formulaic nature making them a good candidate for natural language processing (NLP) that can extract relationships to the decedent. While there have been several studies on obtaining such data from obituaries, we demonstrate for the first time approaches that tie that information to an EHR.</p><p><strong>Methods: </strong>Natural language processing extraction resulted in 8 166 534 family members being abstracted from 567 279 obituaries published in the state of Wisconsin. After matching decedent and family members to patients in the EHR, we identified 200 033 unique patients that were put in 53 640 pedigrees.</p><p><strong>Results: </strong>The largest pedigree consisted of 21 individuals. Heritability of adult height was quantified (H2=0.51±0.04, P<1.00e-07) demonstrating these data's use in genetic research. The heritability data, coupled with overlapping data in a biobank, suggested 80%-90% of familial relationships were accurately defined.</p><p><strong>Conclusion: </strong>The totality of these findings demonstrate obituaries with the oldest people in society can be highly informative for bioinformatic research.</p><p><strong>Availability and implementation: </strong>Code is available on GitHub at https://github.com/jgmayer672/ObituaryNLP.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"1407-1414"},"PeriodicalIF":4.6,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12361849/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144509198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Victor M Castro, Vivian S Gainer, Danielle M Crookes, Shawn N Murphy, Justin Manjourides
{"title":"Comparing patient-reported symptoms and structured clinician documentation in electronic health records.","authors":"Victor M Castro, Vivian S Gainer, Danielle M Crookes, Shawn N Murphy, Justin Manjourides","doi":"10.1093/jamia/ocaf112","DOIUrl":"10.1093/jamia/ocaf112","url":null,"abstract":"<p><strong>Objectives: </strong>Real-world data (RWD) analyses primarily rely on structured clinical documentation collected through routine clinical care or driven by medical billing requirements. Patient-reported outcome measures (PROMs), integrated into electronic health records (EHRs), are an additional data source that could offer valuable insights into a patient's perspective and contribute to a more comprehensive understanding of health outcomes in RWD studies. This study aims to characterize agreement between PROMs symptoms and structured clinical documentation of these symptoms by clinicians in EHRs.</p><p><strong>Materials and methods: </strong>A cross-sectional study of 913 244 adult primary care annual physical visits between January 1, 2019 and December 31, 2023. We compared differences in prevalence and agreement of patient-reported symptoms (PRS) and structured clinician documentation (CD) across 15 respiratory, gastrointestinal, cardiometabolic, and neuropsychiatric symptoms.</p><p><strong>Results: </strong>Patient-reported symptom prevalence were significantly higher compared to CD across most symptoms including joint pain (33% PRS vs 12%), headaches (17% PRS vs 8.8% CD), and sleep disturbance (24% PRS vs 10% CD). Clinicians documented anxiety (11% PRS vs 23% CD) and depression (6.6% PRS vs 15.4% CD) symptoms using structured code at higher rates than patients reported them. Agreement between symptom self-report and clinician-documented structured codes was low to moderate (κ: 0.06-0.39).</p><p><strong>Discussion: </strong>Primary care patients self-report symptoms up to ten times more frequently than clinicians document them with structured codes in the EHR.</p><p><strong>Conclusion: </strong>This work demonstrates the value and feasibility of incorporating PRSs in RWD studies to reduce misclassification and more holistically capture a patient's health.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"1454-1461"},"PeriodicalIF":4.6,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12361852/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144664110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}