JAMIA OpenPub Date : 2026-04-17eCollection Date: 2026-04-01DOI: 10.1093/jamiaopen/ooag054
Min Zhao, Inez Y Oh, Aditi Gupta, Sally Cohen-Cutler, Kathryn M Harmoney, Albert M Lai, Bryan A Sisk
{"title":"Automating evaluation of LLM-generated responses to patient questions about rare diseases.","authors":"Min Zhao, Inez Y Oh, Aditi Gupta, Sally Cohen-Cutler, Kathryn M Harmoney, Albert M Lai, Bryan A Sisk","doi":"10.1093/jamiaopen/ooag054","DOIUrl":"10.1093/jamiaopen/ooag054","url":null,"abstract":"<p><strong>Objectives: </strong>Patients with rare diseases often struggle to find accurate medical information, and large language model (LLM)-based chatbots may help meet this need. However, evaluating LLM-generated free-text answers typically requires physician review, which is time-consuming and difficult to scale. This study compared traditional natural language processing (NLP) metrics to emerging LLM-based evaluation approaches for assessing answer quality in the context of Complex Lymphatic Anomalies (CLAs).</p><p><strong>Materials and methods: </strong>We compiled 25 common patients' questions about CLAs and generated 175 responses to these questions from seven LLMs. Three expert physicians scored these responses for accuracy. We compared these physician-assigned scores with automated scores, generated by four NLP sentence similarity metrics (BLEU, ROUGE, METEOR, BERTScore) and six LLM evaluators (GPT-4, GPT-4o, Qwen3-32B, DeepSeek-R1-14B, Gemma3-27B, LLaMA3.3-70B). We examined both LLM-based scoring with and without reference answers (reference-guided vs reference-free). We calculated Spearman, Phi, and Kendall's Tau correlation coefficients to assess alignment between automated and physician-assigned scores.</p><p><strong>Results: </strong>LLM-based evaluation demonstrated stronger alignment with physician-assigned scores than NLP metrics. The reference-guided GPT-4 evaluator achieved the highest correlation with physician-assigned scores (<i>ρ</i> = 0.758), followed by GPT-4o (<i>ρ</i> = 0.727). NLP metrics showed weak to moderate correlations with physician-assigned scores (<i>ρ</i> = 0.240-0.403). Reference-guided scoring outperformed reference-free methods.</p><p><strong>Discussion: </strong>Reference-guided LLM-based evaluation methods approximate expert physicians' judgment better than traditional NLP metrics, offering an effective, scalable approach for assessing LLM-generated responses to patient questions about rare disease.</p><p><strong>Conclusion: </strong>LLM-based evaluation, particularly reference-guided scoring with GPT models, can support the scalable development and evaluation of LLM-based rare disease-specific chatbot systems.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"9 2","pages":"ooag054"},"PeriodicalIF":3.4,"publicationDate":"2026-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13089572/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147724035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
JAMIA OpenPub Date : 2026-04-12eCollection Date: 2026-04-01DOI: 10.1093/jamiaopen/ooag041
Jordan Tschida, Mayanka Chandrashekar, Heidi A Hanson, Ian Goethert, Daniel Santel, John Pestian, Jeffery R Strawn, Tracy Glauser, Anuj J Kapadia, Greeshma A Agasthya
{"title":"Evolving language of pediatric anxiety in electronic health records.","authors":"Jordan Tschida, Mayanka Chandrashekar, Heidi A Hanson, Ian Goethert, Daniel Santel, John Pestian, Jeffery R Strawn, Tracy Glauser, Anuj J Kapadia, Greeshma A Agasthya","doi":"10.1093/jamiaopen/ooag041","DOIUrl":"https://doi.org/10.1093/jamiaopen/ooag041","url":null,"abstract":"<p><strong>Objectives: </strong>This study aimed to identify and quantify semantic drift (ie, the change in semantic meaning over time) within expert-defined anxiety-related (AR) terminology and compare it to common electronic health record (EHR) vocabulary across longitudinal pediatric clinical notes.</p><p><strong>Materials and methods: </strong>A corpus of pediatric clinical notes from 2009 to 2022 was analyzed using computational methods. Semantic drift for each term was quantified using cosine similarity between annual temporal word embeddings. Contextual meaning was examined through changes in nearest neighbors across years. The Laws of Semantic Change were applied to assess the influence of word frequency and polysemy. Vocabulary terms were categorized as AR or common EHR.</p><p><strong>Results: </strong>98% of AR terminology maintained a cosine similarity between 0.00 and 0.50, indicating moderate semantic stability, whereas 90% of common EHR terms remained between 0.00 and 0.25, showing greater contextual stability overall. Frequent terms exhibited minimal change (Frequency Coefficient = 0.04), whereas highly polysemous or abbreviated terms showed less stability (Polysemy Coefficient = 0.630). AR terminology drifted more slowly than general EHR vocabulary (Type Coefficient = -0.179), further supported by significant year-type interactions (Coef = -0.09 to -0.523).</p><p><strong>Discussion: </strong>Although anxiety-related terminology demonstrates slower semantic drift than general EHR vocabulary, subtle contextual shifts still occur that may affect downstream interpretability and retrieval in automated systems.</p><p><strong>Conclusion: </strong>Continuous linguistic monitoring and adaptive modeling are essential to maintain semantic fidelity and ensure the long-term reliability of clinical decision support systems as healthcare documentation evolves.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"9 2","pages":"ooag041"},"PeriodicalIF":3.4,"publicationDate":"2026-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13071396/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147692489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
JAMIA OpenPub Date : 2026-04-11eCollection Date: 2026-04-01DOI: 10.1093/jamiaopen/ooag046
Sumaiya Afroz Mila, Sandip Ray
{"title":"Multivariate time-series forecasting of liver biomarkers from longitudinal lifestyle data for nonalcoholic steatohepatitis detection.","authors":"Sumaiya Afroz Mila, Sandip Ray","doi":"10.1093/jamiaopen/ooag046","DOIUrl":"https://doi.org/10.1093/jamiaopen/ooag046","url":null,"abstract":"<p><strong>Objectives: </strong>To develop a machine learning method that estimates future liver biomarkers' values from longitudinal lifestyle (diet, activity) data for early detection of nonalcoholic steatohepatitis (NASH).</p><p><strong>Materials and methods: </strong>The method in this study is developed utilizing the nonalcoholic fatty liver disease adult dataset, by National Institute of Diabetes and Digestive and Kidney Diseases, a real-world dataset representative of common electronic health records in the United States. We have developed time-series Machine Learning/Deep Learning and tree-based models to forecast future values for liver biomarkers, identified the minimum requirement of initial data points for optimal forecasting performance, and developed time-series classifier models for detecting NASH from longitudinal lifestyle data and initial biomarker values.</p><p><strong>Results: </strong>Our experiments show that lifestyle-informed forecasting models, such as Attention-long short-term memory and TimeSeriesForestRegressor accurately predict future biomarker trajectories with as few as 2 observed timepoints (prediction error as low as 0.62), and NASH classifiers trained on these <i>Fo</i>recasting liver <i>Bi</i>omarkers (<i>FoBi</i>) estimated biomarkers achieve performance (accuracy 86%) comparable to or exceeding existing biopsy-aligned methods.</p><p><strong>Discussion: </strong>The proposed approach, <i>FoBi</i>, is the first method to forecast liver biomarker trajectories from lifestyle data and demonstrate that both observed and model-estimated biomarkers can support effective NASH detection in real-world clinical settings.</p><p><strong>Conclusion: </strong>Lifestyle-driven biomarker forecasting offers a promising, minimally invasive foundation for early NASH detection and long-term disease management, reducing dependence on frequent laboratory testing and biopsy-aligned measurements.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"9 2","pages":"ooag046"},"PeriodicalIF":3.4,"publicationDate":"2026-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13070654/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147677382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
JAMIA OpenPub Date : 2026-04-09eCollection Date: 2026-04-01DOI: 10.1093/jamiaopen/ooag045
Zhinya Kawa Othman, Mohamed Mustaf Ahmed, Olalekan John Okesanya, Shuaibu Saidu Musa, Don Eliseo Lucero-Prisno
{"title":"Digital vaccines for immunization equity: an approach to strengthen vaccine delivery and public trust in low- and middle-income countries.","authors":"Zhinya Kawa Othman, Mohamed Mustaf Ahmed, Olalekan John Okesanya, Shuaibu Saidu Musa, Don Eliseo Lucero-Prisno","doi":"10.1093/jamiaopen/ooag045","DOIUrl":"https://doi.org/10.1093/jamiaopen/ooag045","url":null,"abstract":"<p><strong>Background: </strong>Global immunization efforts still face major inequities and declining vaccine confidence, leaving millions of children in low- and middle-income countries unvaccinated or under-vaccinated.</p><p><strong>Objectives: </strong>This article aims to discuss \"digital vaccines,\" including SMS reminders, mobile apps, electronic immunization registries, gamification, and virtual reality education, as practical complements to routine immunization services.</p><p><strong>Results: </strong>Using an organizing framework focused on access, equity, and trust, we highlight how digital tools can reduce missed appointments, strengthen follow-up for zero-dose children, improve data quality for planning, and support transparent and culturally responsive communication to counter misinformation. We also outline the barriers that limit equitable impact, including digital divides, gender gaps in phone access, fragmented information systems, limited financing, and concerns about data governance. Many children in poorer countries still do not get the vaccines they need. Some families live too far from clinics. Others do not trust vaccines or the health system. This article looks at how digital tools can help more children get vaccinated. These tools include text message reminders, phone apps, online health records, digital games, and virtual reality lessons. Text reminders help parents remember vaccine dates. Online records help health workers find children who missed their vaccines. Digital games teach people why vaccines are safe. These tools can also help planners know how many vaccines are needed and where to send them. They can share clear, respectful health messages and fight false claims about vaccines. But not everyone can use these tools. Some people do not have smartphones or internet access. Women, who often care for children, may not have their own phones. There are also worries about keeping personal data safe and paying for these systems.</p><p><strong>Conclusions: </strong>We propose implementation principles that emphasize inclusive design, interoperability, privacy safeguards, and hybrid online and offline delivery models. We suggest that digital tools should be easy to use for all, keep private data safe, and work well with other health systems. Where there is no internet, non-digital options should also be offered. With the right support, these tools can help make sure all children get their vaccines.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"9 2","pages":"ooag045"},"PeriodicalIF":3.4,"publicationDate":"2026-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13070474/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147677424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
JAMIA OpenPub Date : 2026-04-09eCollection Date: 2026-04-01DOI: 10.1093/jamiaopen/ooag043
Fangwen Zhou, Muhammad Afzal, Ashirbani Saha, Rick Parrish, R Brian Haynes, Alfonso Iorio, Cynthia Lokker
{"title":"Zero-shot interpretable biomedical literature appraisal with generative large language models.","authors":"Fangwen Zhou, Muhammad Afzal, Ashirbani Saha, Rick Parrish, R Brian Haynes, Alfonso Iorio, Cynthia Lokker","doi":"10.1093/jamiaopen/ooag043","DOIUrl":"https://doi.org/10.1093/jamiaopen/ooag043","url":null,"abstract":"<p><strong>Objective: </strong>This study aims to apply 2 decoder-based Generative Pre-trained Transformer (GPT) models (GPT-4o and GPT-o3-mini) in automating the methodological appraisal of randomized controlled trials (RCTs), under a variety of prompt designs, and to compare their performance to a fine-tuned encoder-only BioLinkBERT model.</p><p><strong>Materials and methods: </strong>A stratified random sample of 800 articles from the McMaster Premium LiteratUre Service and Clinical Hedges databases was appraised using 2 prompting schemes: (1) classifier (independent assessment) and (2) verifier (validation of BioLinkBERT) considering either the title and abstract (TIAB) or the full text of an article. Performance was primarily evaluated against human assessments using Matthews correlation coefficient (MCC). Bootstrapping over 1000 iterations was used to estimate 95% CIs.</p><p><strong>Results: </strong>GPT-4o as a classifier with full text demonstrated comparable performance (MCC 0.429; 95% CI, 0.387-0.470) to BioLinkBERT (MCC, 0.466; 95% CI, 0.409-0.519), drastically outperforming the best GPT-o3-mini scheme (MCC, 0.272; 95% CI, 0.211-0.334). GPT-4o as a verifier with full text showed similar performance (MCC, 0.391; 95% CI, 0.335-0.444). GPT models provided transparent criterion-specific justifications. Performance using TIAB alone markedly decreased for GPT models (MCC, ≤0.100), highlighting dependency on detailed methodological information.</p><p><strong>Discussion: </strong>GPT-4o effectively automates RCT critical appraisal with comparable performance to specialized fine-tuned models when provided full text, enhancing interpretability and transparency through explicit justifications. Limitations in abstract-level detail suggest complementary roles for fine-tuned models when full texts are unavailable. Future studies should optimize goal-specific prompting to further facilitate adoption in clinical knowledge translation workflows.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"9 2","pages":"ooag043"},"PeriodicalIF":3.4,"publicationDate":"2026-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13070470/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147677438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
JAMIA OpenPub Date : 2026-04-05eCollection Date: 2026-04-01DOI: 10.1093/jamiaopen/ooag044
Mustafa Ozkaynak, Saira Haque, Kim M Unertl, Craig Kuziemsky
{"title":"Pragmatic approaches for studying workflow in health informatics.","authors":"Mustafa Ozkaynak, Saira Haque, Kim M Unertl, Craig Kuziemsky","doi":"10.1093/jamiaopen/ooag044","DOIUrl":"https://doi.org/10.1093/jamiaopen/ooag044","url":null,"abstract":"<p><strong>Objective: </strong>To develop recommendations to inform the development and use of pragmatic workflow approaches.</p><p><strong>Problem description: </strong>Workflow analysis in research is rigorous but also resource-intensive, requiring extensive expertise, labor, and time. However, workflow analysis is needed in clinics during technology implementation or evaluation over time in which such labor and time are not necessarily readily available. Pragmatic workflow approaches are required to enable individuals and organizations to conduct workflow analysis as needed in a timely and efficient way.</p><p><strong>Results: </strong>We adapted five principles to guide pragmatic workflow studies: relevance, actionability, comprehensibility, ethical reasoning, and iterative assessment. We also detailed a six-step guideline.</p><p><strong>Discussion and conclusion: </strong>Pragmatic workflow approaches use fewer resources by appropriately trading off rigor to create empirically relevant workflow analysis and recommendations. However, managing this trade-off can be challenging. We can learn from case studies that apply these approaches to determine strategies for future modifications.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"9 2","pages":"ooag044"},"PeriodicalIF":3.4,"publicationDate":"2026-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13050531/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147628775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
JAMIA OpenPub Date : 2026-04-03eCollection Date: 2026-04-01DOI: 10.1093/jamiaopen/ooag033
Paul J Barr, Michelle D Dannenberg, Craig H Ganoe, Elizabeth Carpenter-Song, Reed Wr Bratches, Meredith C Masel, Renata W Yen, Kerri L Cavanaugh, William Haslett, Rebecca Faill, Roger Arend, Sheri Piper, James Ryan, Glyn Elwyn
{"title":"Providing routine digital recordings of clinic visits to patients: a multiple-case study of three settings in the U.S.","authors":"Paul J Barr, Michelle D Dannenberg, Craig H Ganoe, Elizabeth Carpenter-Song, Reed Wr Bratches, Meredith C Masel, Renata W Yen, Kerri L Cavanaugh, William Haslett, Rebecca Faill, Roger Arend, Sheri Piper, James Ryan, Glyn Elwyn","doi":"10.1093/jamiaopen/ooag033","DOIUrl":"https://doi.org/10.1093/jamiaopen/ooag033","url":null,"abstract":"<p><strong>Objective: </strong>To explore the impact, barriers, and facilitators of routinely sharing clinic visit recordings with patients in diverse clinical settings.</p><p><strong>Materials and methods: </strong>We conducted a multiple-case study of three early-adopter clinics in the U.S.: a primary care clinic in Michigan and an oncology clinic in Texas that shared audio recordings, and a neurology clinic in Arizona that shared video recordings. From March 2016 to January 2017, we conducted semi-structured interviews with clinicians, patients, care partners, and administrators (≥18 years, English-speaking), and direct observation of patients using their recordings. Transcripts were analyzed using framework analysis to identify cross-cutting themes. Three coders independently reviewed all transcripts, and a medical anthropologist audited key analytic stages.</p><p><strong>Results: </strong>We interviewed 67 stakeholders (32 patients, 10 care partners, 15 clinicians, and 10 administrators). Across sites, stakeholders reported that recordings improved patients' recall, understanding, and communication. Patients also used recordings for reflection on their performance in visits and planning, while care partners described reduced anxiety and enhanced involvement. Clinicians reported improved visit interactions, and some used recordings for self-assessment. Key factors influencing implementation included clinic culture, institutional support, workflow logistics, data security, and patient characteristics. Concerns were limited and focused primarily on data privacy. A conceptual framework summarizing themes related to barriers, facilitators, use, and impact of routine recording in healthcare was developed.</p><p><strong>Discussion: </strong>Routinely sharing visit recordings can enhance patient-centered communication and care partner engagement while supporting clinician performance. Successful implementation depends on aligning institutional culture, privacy safeguards, and workflow integration.</p><p><strong>Conclusion: </strong>Sharing visit recordings was acceptable and beneficial across stakeholders. The practice of sharing recordings revealed that clinic visit interventions are more than just transactions of medical information-they promote emotional support, self-reflection, and family engagement.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"9 2","pages":"ooag033"},"PeriodicalIF":3.4,"publicationDate":"2026-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13049191/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147623884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
JAMIA OpenPub Date : 2026-04-03eCollection Date: 2026-04-01DOI: 10.1093/jamiaopen/ooag042
Usman Shahid, Natalie Parde, Dale L Smith, Grayson Dickinson, Joseph Bianco, Dillon Thorpe, Madhav Hota, Majid Afshar, Niranjan S Karnik, Neeraj Chhabra
{"title":"Development and evaluation of machine learning models for the detection of emergency department patients with opioid misuse from clinical notes.","authors":"Usman Shahid, Natalie Parde, Dale L Smith, Grayson Dickinson, Joseph Bianco, Dillon Thorpe, Madhav Hota, Majid Afshar, Niranjan S Karnik, Neeraj Chhabra","doi":"10.1093/jamiaopen/ooag042","DOIUrl":"10.1093/jamiaopen/ooag042","url":null,"abstract":"<p><strong>Objectives: </strong>The accurate identification of Emergency Department (ED) encounters involving opioid misuse is critical for health services, research, and surveillance. We sought to develop natural language processing (NLP)-based models for the detection of ED encounters involving opioid misuse.</p><p><strong>Methods: </strong>A sample of ED encounters enriched for opioid misuse was manually annotated and clinical notes extracted. We evaluated classic machine learning (ML) methods, fine-tuning of publicly available pretrained language models, and a previously developed convolutional neural network opioid classifier for use on hospitalized patients (SMART-AI). Performance was benchmarked to opioid-related ICD-10-CM codes. Both raw text and text transformed to the Unified Medical Language System were evaluated. Face validity was evaluated by term feature importance.</p><p><strong>Results: </strong>There were 1123 encounters used for training, validation, and testing. Of the classic ML methods, XGBoost had the highest AU_PRC 0.9358 (95% CI 0.8945, 0.9681), accuracy 0.8874 (0.8402, 0.9349), and F1 score 0.8624 (0.7969, 0.9197) which performed comparably to ICD-10-CM codes [accuracy 0.8687 (0.8155, 0.9167); F1 0.8296 (0.7544, 0.8939)]. Excluding XGBoost, fine-tuned pre-trained language models generally outperformed classic ML methods. The best performing model by point estimate was the fine-tuned SMART-AI based model with domain adaptation [AU_PRC 0.9474 (0.9113, 0.9749); accuracy 0.8816 (0.8284, 0.9290); F1 0.8499 (0.7805, 0.9103)] but confidence intervals overlapped with other models. Explainability analyses showed the most predictive terms were \"heroin,\" \"opioids,\" \"alcoholic intoxication, chronic,\" \"cocaine,\" \"opiates,\" and \"suboxone.\"</p><p><strong>Conclusions: </strong>NLP-based models perform comparably to entry of ICD-10-CM diagnosis codes for the detection of ED encounters with opioid misuse. Fine tuning with domain adaptation for pre-trained language models resulted in improved performance.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"9 2","pages":"ooag042"},"PeriodicalIF":3.4,"publicationDate":"2026-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13049196/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147623864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
JAMIA OpenPub Date : 2026-04-03eCollection Date: 2026-04-01DOI: 10.1093/jamiaopen/ooag038
Taowei D Wang, Darren W Henderson, Griffin M Weber, Michele Morris, Eugene M Sadhu, Shawn N Murphy, Shyam Visweswaran, Jeff G Klann
{"title":"Understanding data differences across the ENACT federated research network.","authors":"Taowei D Wang, Darren W Henderson, Griffin M Weber, Michele Morris, Eugene M Sadhu, Shawn N Murphy, Shyam Visweswaran, Jeff G Klann","doi":"10.1093/jamiaopen/ooag038","DOIUrl":"10.1093/jamiaopen/ooag038","url":null,"abstract":"<p><strong>Objective: </strong>Federated research networks, like Evolve to Next-Gen Accrual of patients to Clinical Trials (ENACT), aim to facilitate medical research by exchanging electronic health record (EHR) data. However, poor data quality can hinder this goal. While networks typically set guidelines and standards to address this problem, we developed an organically evolving, data-centric method using patient counts to identify data quality issues, applicable even to sites not yet in the network.</p><p><strong>Materials and methods: </strong>We distribute high-performance patient counting scripts as part of Integrating Biology at the Bedside (i2b2), which all ENACT sites operate. They produce counts of patients associated with ENACT ontology terms for each site. At the ENACT Hub, our pipeline aggregates site-contributed counts to produce network statistics, which our self-service web application, Data Quality Explorer (DQE), ingests to help sites conduct data quality investigation relative to the network.</p><p><strong>Results: </strong>Thirteen ENACT sites have contributed their patient counts, and currently ten sites have signed up to use DQE to analyze data quality issues. We announced a call to all ENACT sites to contribute additional patient counts.</p><p><strong>Discussion: </strong>Identifying site data quality problems relative to the network is novel. Using a metric based on evolving network statistics complements rigid data quality checks. It is adaptable to any network and has low barriers of entry, with patient counting being the sole requirement.</p><p><strong>Conclusion: </strong>We implemented a metric for conducting data quality investigation in ENACT using patient counting and network statistics. Our end-to-end pipeline is privacy-preserving and the underlying design is generalizable.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"9 2","pages":"ooag038"},"PeriodicalIF":3.4,"publicationDate":"2026-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13049201/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147623940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
JAMIA OpenPub Date : 2026-04-02eCollection Date: 2026-04-01DOI: 10.1093/jamiaopen/ooag037
Teenu Xavier, Jane M Carrington, Joshua Lambert W
{"title":"Detecting stigmatizing language with large language models: mind the settings.","authors":"Teenu Xavier, Jane M Carrington, Joshua Lambert W","doi":"10.1093/jamiaopen/ooag037","DOIUrl":"https://doi.org/10.1093/jamiaopen/ooag037","url":null,"abstract":"<p><strong>Background: </strong>Stigmatizing language in clinical documentation can contribute to healthcare disparities and affect patient-provider relationships. Given their strong capacity for contextual language understanding, large language models (LLMs) offer potential for detecting and reducing such language. This study evaluates the accuracy of LLMs in detecting stigmatizing language, focusing on model size, temperature settings, and the inclusion of examples.</p><p><strong>Methods: </strong>We evaluated multiple configurations of 2 local Llama-based large language models, Llama 3.2 (3B) and Llama 3.1 (8B) with varying temperature (0.25, 0.5, 0.75) and the inclusion of exemple prompts. The models were evaluated on 3643 de-identified clinical notes obtained from a tertiary care teaching hospital. Performance was assessed using accuracy, True Positive Rate (TPR), and True Negative Rate (TNR), with human annotator performance used as a benchmark.</p><p><strong>Results: </strong>The 8B model with a temperature of 0.25 and examples achieved the highest overall accuracy (70.2%), with the best TPR (94.1%), but the lowest TNR (47.4%). The 3B model without examples achieved the highest TNR (99.7%) but a very low TPR (2%). The inclusion of examples improved model accuracy across all configurations, while temperature settings had a variable impact, with smaller models benefiting from higher temperatures and larger models performing better at lower temperatures. ED provider notes showed higher accuracy (69.4%) and the plan of care was the lowest (55.8%).</p><p><strong>Conclusion: </strong>Model size, temperature, and the inclusion of examples play a critical role in optimizing open-source LLM performance. Tailoring these parameters to note types enhances effectiveness. Further research should refine these models for broader clinical application and assess their potential to reduce bias in healthcare documentation.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"9 2","pages":"ooag037"},"PeriodicalIF":3.4,"publicationDate":"2026-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13044510/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147623927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}