Bernardo Consoli, Haoyang Wang, Xizhi Wu, Song Wang, Xinyu Zhao, Yanshan Wang, Justin Rousseau, Tom Hartvigsen, Li Shen, Huanmei Wu, Yifan Peng, Qi Long, Tianlong Chen, Ying Ding
{"title":"SDoH-GPT: using large language models to extract social determinants of health.","authors":"Bernardo Consoli, Haoyang Wang, Xizhi Wu, Song Wang, Xinyu Zhao, Yanshan Wang, Justin Rousseau, Tom Hartvigsen, Li Shen, Huanmei Wu, Yifan Peng, Qi Long, Tianlong Chen, Ying Ding","doi":"10.1093/jamia/ocaf094","DOIUrl":"10.1093/jamia/ocaf094","url":null,"abstract":"<p><strong>Objective: </strong>Extracting social determinants of health (SDoHs) from medical notes depends heavily on labor-intensive annotations, which are typically task-specific, hampering reusability and limiting sharing. Here, we introduce SDoH-GPT, a novel framework leveraging few-shot learning large language models (LLMs) to automate the extraction of SDoH from unstructured text, aiming to improve both efficiency and generalizability.</p><p><strong>Materials and methods: </strong>SDoH-GPT is a framework including the few-shot learning LLM methods to extract the SDoH from medical notes and the XGBoost classifiers which continue to classify SDoH using the annotations generated by the few-shot learning LLM methods as training datasets. The unique combination of the few-shot learning LLM methods with XGBoost utilizes the strength of LLMs as great few shot learners and the efficiency of XGBoost when the training dataset is sufficient. Therefore, SDoH-GPT can extract SDoH without relying on extensive medical annotations or costly human intervention.</p><p><strong>Results: </strong>Our approach achieved tenfold and twentyfold reductions in time and cost, respectively, and superior consistency with human annotators measured by Cohen's kappa of up to 0.92. The innovative combination of LLM and XGBoost can ensure high accuracy and computational efficiency while consistently maintaining 0.90+ AUROC scores.</p><p><strong>Discussion: </strong>This study has verified SDoH-GPT on three datasets and highlights the potential of leveraging LLM and XGBoost to revolutionize medical note classification, demonstrating its capability to achieve highly accurate classifications with significantly reduced time and cost.</p><p><strong>Conclusion: </strong>The key contribution of this study is the integration of LLM with XGBoost, which enables cost-effective and high quality annotations of SDoH. This research sets the stage for SDoH can be more accessible, scalable, and impactful in driving future healthcare solutions.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144267837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reply to Layne et al.'s Letter to the Editor.","authors":"Cathy Shyr, Paul A Harris","doi":"10.1093/jamia/ocaf026","DOIUrl":"10.1093/jamia/ocaf026","url":null,"abstract":"","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"1089"},"PeriodicalIF":4.7,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12089780/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143416026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nidhi Soley, Ilia Rattsev, Traci J Speed, Anping Xie, Kadija S Ferryman, Casey Overby Taylor
{"title":"Predicting postoperative chronic opioid use with fair machine learning models integrating multi-modal data sources: a demonstration of ethical machine learning in healthcare.","authors":"Nidhi Soley, Ilia Rattsev, Traci J Speed, Anping Xie, Kadija S Ferryman, Casey Overby Taylor","doi":"10.1093/jamia/ocaf053","DOIUrl":"10.1093/jamia/ocaf053","url":null,"abstract":"<p><strong>Objective: </strong>Building upon our previous work on predicting chronic opioid use using electronic health records (EHR) and wearable data, this study leveraged the Health Equity Across the AI Lifecycle (HEAAL) framework to (a) fine tune the previously built model with genomic data and evaluate model performance in predicting chronic opioid use and (b) apply IBM's AIF360 pre-processing toolkit to mitigate bias related to gender and race and evaluate the model performance using various fairness metrics.</p><p><strong>Materials and methods: </strong>Participants included approximately 271 All of Us Research Program subjects with EHR, wearable, and genomic data. We fine-tuned 4 machine learning models on the new dataset. The SHapley Additive exPlanations (SHAP) technique identified the best-performing predictors. A preprocessing toolkit boosted fairness by gender and race.</p><p><strong>Results: </strong>The genetic data enhanced model performance from the prior model, with the area under the curve improving from 0.90 (95% CI, 0.88-0.92) to 0.95 (95% CI, 0.89-0.95). Key predictors included Dopamine D1 Receptor (DRD1) rs4532, general type of surgery, and time spent in physical activity. The reweighing preprocessing technique applied to the stacking algorithm effectively improved the model's fairness across racial and gender groups without compromising performance.</p><p><strong>Conclusion: </strong>We leveraged 2 dimensions of the HEAAL framework to build a fair artificial intelligence (AI) solution. Multi-modal datasets (including wearable and genetic data) and applying bias mitigation strategies can help models to more fairly and accurately assess risk across diverse populations, promoting fairness in AI in healthcare.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"985-997"},"PeriodicalIF":4.7,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12089784/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143732817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Letter to the editors in response to \"Leveraging artificial intelligence to summarize abstracts in lay language for increasing research accessibility and transparency\".","authors":"Ethan Layne, Francesco Cei, Giovanni E Cacciamani","doi":"10.1093/jamia/ocaf024","DOIUrl":"10.1093/jamia/ocaf024","url":null,"abstract":"","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"1087-1088"},"PeriodicalIF":4.7,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12089783/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143617684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
John P Powers, Samyuktha Nandhakumar, Sofia Z Dard, Paul Kovach, Peter J Leese
{"title":"Recovering missing electronic health record mortality data with a machine learning-enhanced data linkage process.","authors":"John P Powers, Samyuktha Nandhakumar, Sofia Z Dard, Paul Kovach, Peter J Leese","doi":"10.1093/jamia/ocaf060","DOIUrl":"10.1093/jamia/ocaf060","url":null,"abstract":"<p><strong>Objective: </strong>To develop a continual process for linking more comprehensive external mortality data to electronic health records (EHRs) for a large healthcare system, which can serve as a template for other healthcare systems.</p><p><strong>Materials and methods: </strong>Monthly updates of state death records were arranged, and an automated pipeline was developed to identify matches with patients in the EHR. A machine learning classifier was used to closely match human classification performance of potential record matches.</p><p><strong>Results: </strong>The automated linkage process achieved high performance in classifying potential record matches, with a sensitivity of 99.3% and specificity of 98.8% relative to manual classification. Only 22.4% of identified patient deaths were previously indicated in the EHR.</p><p><strong>Discussion and conclusions: </strong>We developed a solution for recovering missing mortality data for EHR that is effective, scalable for cost and computation, and sustainable over time. These recovered mortality data now supplement the EHR data available for research purposes.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"1061-1065"},"PeriodicalIF":4.7,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12089760/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143993921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Catina O'Leary, Milton Mickey Eder, Sumana Goli, Sam Pettyjohn, Elizabeth Rattine-Flaherty, Yousra Jatt, Linda B Cottler
{"title":"Assessing health literacy and diversity within the All of Us Research Program.","authors":"Catina O'Leary, Milton Mickey Eder, Sumana Goli, Sam Pettyjohn, Elizabeth Rattine-Flaherty, Yousra Jatt, Linda B Cottler","doi":"10.1093/jamia/ocae225","DOIUrl":"10.1093/jamia/ocae225","url":null,"abstract":"<p><strong>Objective: </strong>The objective was to understand the association between people with adequate and inadequate health literacy (HL) in the All of Us cohort.</p><p><strong>Materials and methods: </strong>Overall, health survey responses to 3 questions from 246 555 people, ages 18-77 years in the controlled tier V7 dataset, were used to assess and compare HL. HL scores ranged from 3 to 15, with scores ≤9 indicating inadequate HL and >9 indicating adequate HL.</p><p><strong>Results: </strong>Cohort participants' responses indicate 92.4% met criteria for adequate HL. Persons with inadequate HL versus adequate HL were likely to be Gen X, male, Black, report an income less than $25k, and have less than a high school education. Furthermore, the rate of HL may not represent that for the broader US population.</p><p><strong>Discussion: </strong>All of Us participants had much higher rates of HL than that for the 2003 National Assessment of Adult Literacy, suggesting approximately over 90% of the US population has HL challenges. The All of Us cohort's high rates of HL may reflect response and recruitment bias. Given the emphasis on diversity and inclusion within the cohort, and understanding HL as the ability to find, understand, and use health information, revisiting the recruitment strategies and, potentially, the assessment of HL within the All of Us cohort is recommended.</p><p><strong>Conclusion: </strong>Factoring HL into diversity and inclusion research recruitment efforts will require review and testing of innovative approaches to community recruitment, engagement, and retention methods. Infusing HL into precision medicine can advance opportunities for individual improvement in health promotion and disease management. Future population level efforts in precision medicine should consider more sensitive measures to critical social determinants of health, such as health literacy, to more carefully characterize diversity and inclusion in these studies.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"1025-1031"},"PeriodicalIF":4.7,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12089785/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144020167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Felix J Dorfner, Amin Dada, Felix Busch, Marcus R Makowski, Tianyu Han, Daniel Truhn, Jens Kleesiek, Madhumita Sushil, Lisa C Adams, Keno K Bressem
{"title":"Evaluating the effectiveness of biomedical fine-tuning for large language models on clinical tasks.","authors":"Felix J Dorfner, Amin Dada, Felix Busch, Marcus R Makowski, Tianyu Han, Daniel Truhn, Jens Kleesiek, Madhumita Sushil, Lisa C Adams, Keno K Bressem","doi":"10.1093/jamia/ocaf045","DOIUrl":"10.1093/jamia/ocaf045","url":null,"abstract":"<p><strong>Objectives: </strong>Large language models (LLMs) have shown potential in biomedical applications, leading to efforts to fine-tune them on domain-specific data. However, the effectiveness of this approach remains unclear. This study aims to critically evaluate the performance of biomedically fine-tuned LLMs against their general-purpose counterparts across a range of clinical tasks.</p><p><strong>Materials and methods: </strong>We evaluated the performance of biomedically fine-tuned LLMs against their general-purpose counterparts on clinical case challenges from NEJM and JAMA, and on multiple clinical tasks, such as information extraction, document summarization and clinical coding. We used a diverse set of benchmarks specifically chosen to be outside the likely fine-tuning datasets of biomedical models, ensuring a fair assessment of generalization capabilities.</p><p><strong>Results: </strong>Biomedical LLMs generally underperformed compared to general-purpose models, especially on tasks not focused on probing medical knowledge. While on the case challenges, larger biomedical and general-purpose models showed similar performance (eg, OpenBioLLM-70B: 66.4% vs Llama-3-70B-Instruct: 65% on JAMA), smaller biomedical models showed more pronounced underperformance (OpenBioLLM-8B: 30% vs Llama-3-8B-Instruct: 64.3% on NEJM). Similar trends appeared across CLUE benchmarks, with general-purpose models often achieving higher scores in text generation, question answering, and coding. Notably, biomedical LLMs also showed a higher tendency to hallucinate.</p><p><strong>Discussion: </strong>Our findings challenge the assumption that biomedical fine-tuning inherently improves LLM performance, as general-purpose models consistently performed better on unseen medical tasks. Retrieval-augmented generation may offer a more effective strategy for clinical adaptation.</p><p><strong>Conclusion: </strong>Fine-tuning LLMs on biomedical data may not yield the anticipated benefits. Alternative approaches, such as retrieval augmentation, should be further explored for effective and reliable clinical integration of LLMs.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"1015-1024"},"PeriodicalIF":4.7,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12089759/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143796799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Adam Rule, Phillip Vang, Mark A Micek, Brian G Arndt
{"title":"Primary care staff members' experiences with managing electronic health record inbox messages.","authors":"Adam Rule, Phillip Vang, Mark A Micek, Brian G Arndt","doi":"10.1093/jamia/ocaf067","DOIUrl":"10.1093/jamia/ocaf067","url":null,"abstract":"<p><strong>Objective: </strong>Clinical staff often help clinicians review and respond to messages from patients. This study aimed to characterize primary care staff members' experiences with inbox work.</p><p><strong>Materials and methods: </strong>In this qualitative study, we conducted direct observations and focus groups with clinical staff at 4 academic primary care clinics. We used inductive thematic analysis to code the resulting notes and transcripts for themes in staff members' experience with inbox work.</p><p><strong>Results: </strong>Nine medical assistants and 3 nurses participated in the study. Staff described inbox work as fragmented, feeling like an assembly line, requiring frequent communication with other team members to clarify and manage tasks, and requiring navigation of expectations that varied between patients, clinicians, and clinics. Staff described some messages as being more difficult to manage due to how requests were posed, challenges with subsequent communication, and mismatches between data from different sources. Staff also described how tools that structured or automated message management aided inbox work.</p><p><strong>Discussion: </strong>Staff addressed routine messages by following known protocols and appreciated tools that structured their inbox work. However, staff also regularly encountered messages with information that conflicted with clinic records or that contained multiple, redundant, or vague requests. Addressing these messages required additional work to clarify information (ie, data work) and manage resulting tasks (ie, articulation work).</p><p><strong>Conclusion: </strong>Clinic workflows and health information technology should support not only the readily standardized work of addressing routine messages but also the more varied work of preparing messages to be addressed in the first place.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"1040-1049"},"PeriodicalIF":4.7,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12089763/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144005422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Harnessing the power of large language models for clinical tasks and synthesis of scientific literature.","authors":"Suzanne Bakken","doi":"10.1093/jamia/ocaf071","DOIUrl":"10.1093/jamia/ocaf071","url":null,"abstract":"","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":"32 6","pages":"983-984"},"PeriodicalIF":4.7,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12089756/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144103139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mehdi Nourelahi, Eugene M Sadhu, Malarkodi J Samayamuthu, Shyam Visweswaran
{"title":"A resource for Logical Observation Identifiers Names and Codes terms that may be associated with identifying information.","authors":"Mehdi Nourelahi, Eugene M Sadhu, Malarkodi J Samayamuthu, Shyam Visweswaran","doi":"10.1093/jamia/ocaf061","DOIUrl":"10.1093/jamia/ocaf061","url":null,"abstract":"<p><strong>Objectives: </strong>The primary objective was to compile a comprehensive list of Logical Observation Identifiers Names and Codes (LOINC) terms that may be associated with patient, healthcare provider, and healthcare facility identifying information.</p><p><strong>Materials and methods: </strong>We developed a 2-step procedure for identifying LOINC terms, which consists of a keyword search of Long Common Names and filtering on selected property values, followed by expert physician review to confirm and categorize the terms.</p><p><strong>Results: </strong>The final list comprises 1309 LOINC terms potentially associated with identifying information of patients, providers, and facilities. This list is publicly available on GitHub.</p><p><strong>Discussion: </strong>Compared with electronic health record data coded with other terminologies, LOINC-coded data present unique challenges for deidentification, and a resource of LOINC terms that may be associated with identifying information will be helpful for this purpose.</p><p><strong>Conclusion: </strong>This resource is valuable for deidentifying LOINC-coded data, ensuring compliance with the Health Insurance Portability and Accountability Act (HIPAA), and preserving the privacy of patients, providers, and facilities.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"1066-1070"},"PeriodicalIF":4.7,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12089774/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144020066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}