Felix J Dorfner, Amin Dada, Felix Busch, Marcus R Makowski, Tianyu Han, Daniel Truhn, Jens Kleesiek, Madhumita Sushil, Lisa C Adams, Keno K Bressem
{"title":"Evaluating the effectiveness of biomedical fine-tuning for large language models on clinical tasks.","authors":"Felix J Dorfner, Amin Dada, Felix Busch, Marcus R Makowski, Tianyu Han, Daniel Truhn, Jens Kleesiek, Madhumita Sushil, Lisa C Adams, Keno K Bressem","doi":"10.1093/jamia/ocaf045","DOIUrl":"https://doi.org/10.1093/jamia/ocaf045","url":null,"abstract":"<p><strong>Objectives: </strong>Large language models (LLMs) have shown potential in biomedical applications, leading to efforts to fine-tune them on domain-specific data. However, the effectiveness of this approach remains unclear. This study aims to critically evaluate the performance of biomedically fine-tuned LLMs against their general-purpose counterparts across a range of clinical tasks.</p><p><strong>Materials and methods: </strong>We evaluated the performance of biomedically fine-tuned LLMs against their general-purpose counterparts on clinical case challenges from NEJM and JAMA, and on multiple clinical tasks, such as information extraction, document summarization and clinical coding. We used a diverse set of benchmarks specifically chosen to be outside the likely fine-tuning datasets of biomedical models, ensuring a fair assessment of generalization capabilities.</p><p><strong>Results: </strong>Biomedical LLMs generally underperformed compared to general-purpose models, especially on tasks not focused on probing medical knowledge. While on the case challenges, larger biomedical and general-purpose models showed similar performance (eg, OpenBioLLM-70B: 66.4% vs Llama-3-70B-Instruct: 65% on JAMA), smaller biomedical models showed more pronounced underperformance (OpenBioLLM-8B: 30% vs Llama-3-8B-Instruct: 64.3% on NEJM). Similar trends appeared across CLUE benchmarks, with general-purpose models often achieving higher scores in text generation, question answering, and coding. Notably, biomedical LLMs also showed a higher tendency to hallucinate.</p><p><strong>Discussion: </strong>Our findings challenge the assumption that biomedical fine-tuning inherently improves LLM performance, as general-purpose models consistently performed better on unseen medical tasks. Retrieval-augmented generation may offer a more effective strategy for clinical adaptation.</p><p><strong>Conclusion: </strong>Fine-tuning LLMs on biomedical data may not yield the anticipated benefits. Alternative approaches, such as retrieval augmentation, should be further explored for effective and reliable clinical integration of LLMs.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2025-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143796799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chao Yan, Monika E Grabowska, Rut Thakkar, Alyson L Dickson, Peter J Embí, QiPing Feng, Joshua C Denny, Vern Eric Kerchberger, Bradley A Malin, Wei-Qi Wei
{"title":"Beyond Phecodes: leveraging PheMAP to identify patients lacking diagnosis codes in electronic health records.","authors":"Chao Yan, Monika E Grabowska, Rut Thakkar, Alyson L Dickson, Peter J Embí, QiPing Feng, Joshua C Denny, Vern Eric Kerchberger, Bradley A Malin, Wei-Qi Wei","doi":"10.1093/jamia/ocaf055","DOIUrl":"https://doi.org/10.1093/jamia/ocaf055","url":null,"abstract":"<p><strong>Objective: </strong>Diagnosis codes documented in electronic health records (EHR) are often relied upon to clinically phenotype patients for biomedical research. However, these diagnoses can be incomplete and inaccurate, leading to false negatives when searching for patients with phenotypes of interest. This study aims to determine whether PheMAP, a comprehensive knowledgebase integrating multiple clinical terminologies beyond diagnosis to capture phenotypes, can effectively identify patients lacking relevant EHR diagnosis codes.</p><p><strong>Materials and methods: </strong>We investigated a collection of 3.5 million patient records from Vanderbilt University Medical Center's EHR and focused on 4 well-studied phenotypes: (1) type 2 diabetes mellitus (T2DM), (2) dementia, (3) prostate cancer, and (4) sensorineural hearing loss. We applied PheMAP to match structured concepts in patient records and calculated a phenotype risk score (PheScore) to indicate patient-phenotype similarity. Patients meeting predefined PheScore criteria but lacking diagnosis codes were identified. Clinically knowledgeable experts adjudicated randomly selected patients per phenotype as Positive, Possibly Positive, or Negative.</p><p><strong>Results: </strong>Our approach indicated that 5.3% of patients lacked a diagnosis for T2DM, 4.5% for dementia, 2.2% for prostate cancer, and 0.2% for sensorineural hearing loss. The expert review indicated 100% precision (for Possibly Positive or Positive cases) for dementia and sensorineural hearing loss, and 90.0% and 85.0% precision for T2DM and prostate cancer, respectively. Excluding Possibly Positive cases, the precision for T2DM and prostate cancer was 88.9% and 81.3%, respectively.</p><p><strong>Conclusions: </strong>Leveraging clinical terminologies incorporated by PheMAP can effectively identify patients with phenotypes who lack EHR diagnosis codes, thereby enhancing phenotyping quality and related research reliability.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2025-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143744134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nidhi Soley, Ilia Rattsev, Traci J Speed, Anping Xie, Kadija S Ferryman, Casey Overby Taylor
{"title":"Predicting postoperative chronic opioid use with fair machine learning models integrating multi-modal data sources: a demonstration of ethical machine learning in healthcare.","authors":"Nidhi Soley, Ilia Rattsev, Traci J Speed, Anping Xie, Kadija S Ferryman, Casey Overby Taylor","doi":"10.1093/jamia/ocaf053","DOIUrl":"https://doi.org/10.1093/jamia/ocaf053","url":null,"abstract":"<p><strong>Objective: </strong>Building upon our previous work on predicting chronic opioid use using electronic health records (EHR) and wearable data, this study leveraged the Health Equity Across the AI Lifecycle (HEAAL) framework to (a) fine tune the previously built model with genomic data and evaluate model performance in predicting chronic opioid use and (b) apply IBM's AIF360 pre-processing toolkit to mitigate bias related to gender and race and evaluate the model performance using various fairness metrics.</p><p><strong>Materials and methods: </strong>Participants included approximately 271 All of Us Research Program subjects with EHR, wearable, and genomic data. We fine-tuned 4 machine learning models on the new dataset. The SHapley Additive exPlanations (SHAP) technique identified the best-performing predictors. A preprocessing toolkit boosted fairness by gender and race.</p><p><strong>Results: </strong>The genetic data enhanced model performance from the prior model, with the area under the curve improving from 0.90 (95% CI, 0.88-0.92) to 0.95 (95% CI, 0.89-0.95). Key predictors included Dopamine D1 Receptor (DRD1) rs4532, general type of surgery, and time spent in physical activity. The reweighing preprocessing technique applied to the stacking algorithm effectively improved the model's fairness across racial and gender groups without compromising performance.</p><p><strong>Conclusion: </strong>We leveraged 2 dimensions of the HEAAL framework to build a fair artificial intelligence (AI) solution. Multi-modal datasets (including wearable and genetic data) and applying bias mitigation strategies can help models to more fairly and accurately assess risk across diverse populations, promoting fairness in AI in healthcare.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143732817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abed A Hijleh, Sophia Wang, Danilo C Berton, Igor Neder-Serafini, Sandra Vincent, Matthew James, Nicolle Domnik, Devin Phillips, Luiz E Nery, Denis E O'Donnell, J Alberto Neder
{"title":"AI-Techniques Loss-Based Algorithm for Severity Classification (ATLAS): a novel approach for continuous quantification of exertional symptoms during incremental exercise testing.","authors":"Abed A Hijleh, Sophia Wang, Danilo C Berton, Igor Neder-Serafini, Sandra Vincent, Matthew James, Nicolle Domnik, Devin Phillips, Luiz E Nery, Denis E O'Donnell, J Alberto Neder","doi":"10.1093/jamia/ocaf051","DOIUrl":"https://doi.org/10.1093/jamia/ocaf051","url":null,"abstract":"<p><strong>Objective: </strong>Heightened muscular effort and breathlessness (dyspnea) are disabling sensory experiences. We sought to improve the current approach of assessing these symptoms only at the maximal effort to new paradigms based on their continuous quantification throughout cardiopulmonary exercise testing (CPET).</p><p><strong>Materials and methods: </strong>After establishing sex- and age-adjusted reference centiles (0-10 Borg scale), we developed a novel algorithm (AI-Techniques Loss-Based Algorithm for Severity Classification [ATLAS]) based on reciprocal exponential loss for CPET data from patients with chronic obstructive lung disease of varied severity.</p><p><strong>Results: </strong>Categories of dyspnea intensity by ATLAS-but not dyspnea at peak exercise-correctly discriminated patients in progressively higher resting and exercise impairment (P < .05).</p><p><strong>Discussion: </strong>This new AI-techniques approach will be translated to the care of disabled patients to uncover the seeds and consequences of their activity-related symptoms.</p><p><strong>Conclusions: </strong>We used innovative informatics research to change paradigms in displaying, quantifying, and analyzing effort-related symptoms in patient populations.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143732704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gibong Hong, Veronica Hindle, Nadine M Veasley, Hannah D Holscher, Halil Kilicoglu
{"title":"DiMB-RE: mining the scientific literature for diet-microbiome associations.","authors":"Gibong Hong, Veronica Hindle, Nadine M Veasley, Hannah D Holscher, Halil Kilicoglu","doi":"10.1093/jamia/ocaf054","DOIUrl":"https://doi.org/10.1093/jamia/ocaf054","url":null,"abstract":"<p><strong>Objectives: </strong>To develop a corpus annotated for diet-microbiome associations from the biomedical literature and train natural language processing (NLP) models to identify these associations, thereby improving the understanding of their role in health and disease, and supporting personalized nutrition strategies.</p><p><strong>Materials and methods: </strong>We constructed DiMB-RE, a comprehensive corpus annotated with 15 entity types (eg, Nutrient, Microorganism) and 13 relation types (eg, increases, improves) capturing diet-microbiome associations. We fine-tuned and evaluated state-of-the-art NLP models for named entity, trigger, and relation extraction as well as factuality detection using DiMB-RE. In addition, we benchmarked 2 generative large language models (GPT-4o-mini and GPT-4o) on a subset of the dataset in zero- and one-shot settings.</p><p><strong>Results: </strong>DiMB-RE consists of 14 450 entities and 4206 relationships from 165 publications (including 30 full-text Results sections). Fine-tuned NLP models performed reasonably well for named entity recognition (0.800 F1 score), while end-to-end relation extraction performance was modest (0.445 F1). The use of Results section annotations improved relation extraction. The impact of trigger detection was mixed. Generative models showed lower accuracy compared to fine-tuned models.</p><p><strong>Discussion: </strong>To our knowledge, DiMB-RE is the largest and most diverse corpus focusing on diet-microbiome interactions. Natural language processing models fine-tuned on DiMB-RE exhibit lower performance compared to similar corpora, highlighting the complexity of information extraction in this domain. Misclassified entities, missed triggers, and cross-sentence relations are the major sources of relation extraction errors.</p><p><strong>Conclusion: </strong>DiMB-RE can serve as a benchmark corpus for biomedical literature mining. DiMB-RE and the NLP models are available at https://github.com/ScienceNLP-Lab/DiMB-RE.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143732814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fahad Kamran, Donna Tjandra, Thomas S Valley, Hallie C Prescott, Nigam H Shah, Vincent X Liu, Eric Horvitz, Jenna Wiens
{"title":"Reformulating patient stratification for targeting interventions by accounting for severity of downstream outcomes resulting from disease onset: a case study in sepsis.","authors":"Fahad Kamran, Donna Tjandra, Thomas S Valley, Hallie C Prescott, Nigam H Shah, Vincent X Liu, Eric Horvitz, Jenna Wiens","doi":"10.1093/jamia/ocaf036","DOIUrl":"https://doi.org/10.1093/jamia/ocaf036","url":null,"abstract":"<p><strong>Objectives: </strong>To quantify differences between (1) stratifying patients by predicted disease onset risk alone and (2) stratifying by predicted disease onset risk and severity of downstream outcomes. We perform a case study of predicting sepsis.</p><p><strong>Materials and methods: </strong>We performed a retrospective analysis using observational data from Michigan Medicine at the University of Michigan (U-M) between 2016 and 2020 and the Beth Israel Deaconess Medical Center (BIDMC) between 2008 and 2012. We measured the correlation between the estimated sepsis risk and the estimated effect of sepsis on mortality using Spearman's correlation. We compared patients stratified by sepsis risk with patients stratified by sepsis risk and effect of sepsis on mortality.</p><p><strong>Results: </strong>The U-M and BIDMC cohorts included 7282 and 5942 ICU visits; 7.9% and 8.1% developed sepsis, respectively. Among visits with sepsis, 21.9% and 26.3% experienced mortality at U-M and BIDMC. The effect of sepsis on mortality was weakly correlated with sepsis risk (U-M: 0.35 [95% CI: 0.33-0.37], BIDMC: 0.31 [95% CI: 0.28-0.34]). High-risk patients identified by both stratification approaches overlapped by 66.8% and 52.8% at U-M and BIDMC, respectively. Accounting for risk of mortality identified an older population (U-M: age = 66.0 [interquartile range-IQR: 55.0-74.0] vs age = 63.0 [IQR: 51.0-72.0], BIDMC: age = 74.0 [IQR: 61.0-83.0] vs age = 68.0 [IQR: 59.0-78.0]).</p><p><strong>Discussion: </strong>Predictive models that guide selective interventions ignore the effect of disease on downstream outcomes. Reformulating patient stratification to account for the estimated effect of disease on downstream outcomes identifies a different population compared to stratification on disease risk alone.</p><p><strong>Conclusion: </strong>Models that predict the risk of disease and ignore the effects of disease on downstream outcomes could be suboptimal for stratification.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143701773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rohan Sanghera, Arun James Thirunavukarasu, Marc El Khoury, Jessica O'Logbon, Yuqing Chen, Archie Watt, Mustafa Mahmood, Hamid Butt, George Nishimura, Andrew A S Soltan
{"title":"High-performance automated abstract screening with large language model ensembles.","authors":"Rohan Sanghera, Arun James Thirunavukarasu, Marc El Khoury, Jessica O'Logbon, Yuqing Chen, Archie Watt, Mustafa Mahmood, Hamid Butt, George Nishimura, Andrew A S Soltan","doi":"10.1093/jamia/ocaf050","DOIUrl":"https://doi.org/10.1093/jamia/ocaf050","url":null,"abstract":"<p><strong>Objective: </strong>screening is a labor-intensive component of systematic review involving repetitive application of inclusion and exclusion criteria on a large volume of studies. We aimed to validate large language models (LLMs) used to automate abstract screening.</p><p><strong>Materials and methods: </strong>LLMs (GPT-3.5 Turbo, GPT-4 Turbo, GPT-4o, Llama 3 70B, Gemini 1.5 Pro, and Claude Sonnet 3.5) were trialed across 23 Cochrane Library systematic reviews to evaluate their accuracy in zero-shot binary classification for abstract screening. Initial evaluation on a balanced development dataset (n = 800) identified optimal prompting strategies, and the best performing LLM-prompt combinations were then validated on a comprehensive dataset of replicated search results (n = 119 695).</p><p><strong>Results: </strong>On the development dataset, LLMs exhibited superior performance to human researchers in terms of sensitivity (LLMmax = 1.000, humanmax = 0.775), precision (LLMmax = 0.927, humanmax = 0.911), and balanced accuracy (LLMmax = 0.904, humanmax = 0.865). When evaluated on the comprehensive dataset, the best performing LLM-prompt combinations exhibited consistent sensitivity (range 0.756-1.000) but diminished precision (range 0.004-0.096) due to class imbalance. In addition, 66 LLM-human and LLM-LLM ensembles exhibited perfect sensitivity with a maximal precision of 0.458 with the development dataset, decreasing to 0.1450 over the comprehensive dataset; but conferring workload reductions ranging between 37.55% and 99.11%.</p><p><strong>Discussion: </strong>Automated abstract screening can reduce the screening workload in systematic review while maintaining quality. Performance variation between reviews highlights the importance of domain-specific validation before autonomous deployment. LLM-human ensembles can achieve similar benefits while maintaining human oversight over all records.</p><p><strong>Conclusion: </strong>LLMs may reduce the human labor cost of systematic review with maintained or improved accuracy, thereby increasing the efficiency and quality of evidence synthesis.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143677361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sahil Sandhu, Michael Liu, Laura M Gottlieb, A Jay Holmgren, Lisa S Rotenstein, Matthew S Pantell
{"title":"Interoperability of health-related social needs data at US hospitals.","authors":"Sahil Sandhu, Michael Liu, Laura M Gottlieb, A Jay Holmgren, Lisa S Rotenstein, Matthew S Pantell","doi":"10.1093/jamia/ocaf049","DOIUrl":"https://doi.org/10.1093/jamia/ocaf049","url":null,"abstract":"<p><strong>Objective: </strong>To measure hospital engagement in interoperable exchange of health-related social needs (HRSN) data.</p><p><strong>Materials and methods: </strong>This study combined national data from the 2022 American Hospital Association (AHA) Annual Survey, AHA IT Supplement, and the Centers for Medicare and Medicaid Services Impact File. Multivariable logistic regression was used to identify hospital characteristics associated with receiving HRSN data from external organizations.</p><p><strong>Results: </strong>Of 2502 hospitals, 61.4% reported electronically receiving HRSN data from external sources, most commonly from health information exchange organizations. Hospitals participating in accountable care organizations or patient-centered medical homes and hospitals using Epic or Cerner electronic health records (EHRs) were more likely to receive external HRSN data. In contrast, for-profit hospitals and public hospitals were less likely to participate in HRSN data exchange.</p><p><strong>Discussion: </strong>Hospital ownership, participation in value-based care models, and EHR vendor capabilities are important drivers in advancing HRSN data exchange.</p><p><strong>Conclusion: </strong>Additional policy and technological support may be needed to enhance HRSN data interoperability.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143674776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Joshua M Biro, Jessica L Handley, James Mickler, Sahithi Reddy, Varsha Kottamasu, Raj M Ratwani, Nathan K Cobb
{"title":"The value of simulation testing for the evaluation of ambient digital scribes: a case report.","authors":"Joshua M Biro, Jessica L Handley, James Mickler, Sahithi Reddy, Varsha Kottamasu, Raj M Ratwani, Nathan K Cobb","doi":"10.1093/jamia/ocaf052","DOIUrl":"https://doi.org/10.1093/jamia/ocaf052","url":null,"abstract":"<p><strong>Objectives: </strong>The objective of this work is to demonstrate the value of simulation testing for rapidly evaluating artificial intelligence (AI) products.</p><p><strong>Materials and methods: </strong>Researcher-physician teams simulated the use of 2 Ambient Digital Scribe (ADS) products by reading scripts of outpatient encounters while using both products, yielding a total of 44 draft notes. Time to edit, perceived amount of effort and editing, and errors in the AI-generated draft notes were analyzed.</p><p><strong>Results: </strong>Ambient Digital Scribe Product A draft notes took significantly longer to edit, had fewer omissions, and more additions and irrelevant or misplaced text errors than ADS Product B. Ambient Digital Scribe Product A was rated as performing better for most encounters.</p><p><strong>Discussion: </strong>Artificial intelligence-enabled products are being rapidly developed and implemented into practice, outpacing safety concerns. Simulation testing can efficiently identify safety issues.</p><p><strong>Conclusion: </strong>Simulation testing is a crucial first step to take when evaluating AI-enabled technologies.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143674778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tom Arthur, Sophie Robinson, Samuel Vine, Lauren Asare, G J Melendez-Torres
{"title":"Equity implications of extended reality technologies for health and procedural anxiety: a systematic review and implementation-focused framework.","authors":"Tom Arthur, Sophie Robinson, Samuel Vine, Lauren Asare, G J Melendez-Torres","doi":"10.1093/jamia/ocaf047","DOIUrl":"https://doi.org/10.1093/jamia/ocaf047","url":null,"abstract":"<p><strong>Objectives: </strong>Extended reality (XR) applications are gaining support as a method of reducing anxieties about medical treatments and conditions; however, their impacts on health service inequalities remain underresearched. We therefore undertook a synthesis of evidence relating to the equity implications of these types of interventions.</p><p><strong>Materials and methods: </strong>Searches of MEDLINE, Embase, APA PsycINFO, and Epistemonikos were conducted in May 2023 to identify reviews of patient-directed XR interventions for health and procedural anxiety. Equity-relevant data were extracted from records (n = 56) that met these criteria, and from individual trials (n = 63) evaluated within 5 priority reviews. Analyses deductively categorized data into salient situation- and technology-related mechanisms, which were then developed into a novel implementation-focused framework.</p><p><strong>Results: </strong>Analyses highlighted various mechanisms that impact on the availability, accessibility, and/or acceptability of services aiming to reduce patient health and procedural anxieties. On one hand, results showed that XR solutions offer unique opportunities for addressing health inequities, especially those concerning transport, cost, or mobility barriers. At the same time, however, these interventions can accelerate areas of inequity or even engender additional disparities.</p><p><strong>Discussion: </strong>Our \"double jeopardy, common impact\" framework outlines unique pathways through which XR could help address health disparities, but also accelerate or even generate inequity across different systems, communities, and individuals. This framework can be used to guide prospective interventions and assessments.</p><p><strong>Conclusion: </strong>Despite growing positive assertions about XR's capabilities for managing patient anxieties, we emphasize the need for taking a cautious, inclusive approach to implementation in future programs.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143671450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}