JAMIA Open最新文献

筛选
英文 中文
Evaluating the impact of data biases on algorithmic fairness and clinical utility of machine learning models for prolonged opioid use prediction. 评估数据偏差对算法公平性和机器学习模型用于阿片类药物长期使用预测的临床效用的影响。
IF 3.4
JAMIA Open Pub Date : 2025-09-30 eCollection Date: 2025-10-01 DOI: 10.1093/jamiaopen/ooaf115
Behzad Naderalvojoud, Catherine Curtin, Steven M Asch, Keith Humphreys, Tina Hernandez-Boussard
{"title":"Evaluating the impact of data biases on algorithmic fairness and clinical utility of machine learning models for prolonged opioid use prediction.","authors":"Behzad Naderalvojoud, Catherine Curtin, Steven M Asch, Keith Humphreys, Tina Hernandez-Boussard","doi":"10.1093/jamiaopen/ooaf115","DOIUrl":"10.1093/jamiaopen/ooaf115","url":null,"abstract":"<p><strong>Objectives: </strong>The growing use of machine learning (ML) in healthcare raises concerns about how data biases affect real-world model performance. While existing frameworks evaluate algorithmic fairness, they often overlook the impact of bias on generalizability and clinical utility, which are critical for safe deployment. Building on prior methods, this study extends bias analysis to include clinical utility, addressing a key gap between fairness evaluation and decision-making.</p><p><strong>Materials and methods: </strong>We applied a 3-phase evaluation to a previously developed model predicting prolonged opioid use (POU), validated on Veterans Health Administration (VHA) data. The analysis included internal and external validation, model retraining on VHA data, and subgroup evaluation across demographic, vulnerable, risk, and comorbidity groups. We assessed performance using area under the receiver operating characteristic curve (AUROC), calibration, and decision curve analysis, incorporating standardized net-benefits to evaluate clinical utility alongside fairness and generalizability.</p><p><strong>Results: </strong>The internal cohort (<i>N</i> = 41 929) had a 14.7% POU prevalence, compared to 34.3% in the external VHA cohort (<i>N</i> = 397 150). The model's AUROC decreased from 0.74 in the internal test cohort to 0.70 in the full external cohort. Subgroup-level performance averaged 0.69 (SD = 0.01), showing minimal deviation from the external cohort overall. Retraining on VHA data improved AUROCs to 0.82. Clinical utility analysis showed systematic shifts in net-benefit across threshold probabilities.</p><p><strong>Discussion: </strong>While the POU model showed generalizability and fairness internally, external validation and retraining revealed performance and utility shifts across subgroups.</p><p><strong>Conclusion: </strong>Population-specific biases affect clinical utility-an often-overlooked dimension in fairness evaluation-a key need to ensure equitable benefits across diverse patient groups.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 5","pages":"ooaf115"},"PeriodicalIF":3.4,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12483547/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145207911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Developing a real-time registry to track breast cancer patients across the city of Boston. 开发一个实时注册系统来跟踪整个波士顿市的乳腺癌患者。
IF 3.4
JAMIA Open Pub Date : 2025-09-29 eCollection Date: 2025-10-01 DOI: 10.1093/jamiaopen/ooaf099
Amy M LeClair, Clara A Chen, Marisa L Mizzoni, William G Adams, William F Harvey, Christopher W Shanahan, Jennifer S Haas, Stephenie C Lemon, Tracy Battaglia, Karen M Freund
{"title":"Developing a real-time registry to track breast cancer patients across the city of Boston.","authors":"Amy M LeClair, Clara A Chen, Marisa L Mizzoni, William G Adams, William F Harvey, Christopher W Shanahan, Jennifer S Haas, Stephenie C Lemon, Tracy Battaglia, Karen M Freund","doi":"10.1093/jamiaopen/ooaf099","DOIUrl":"10.1093/jamiaopen/ooaf099","url":null,"abstract":"<p><strong>Objectives: </strong>Patient navigation is designed to identify and address patients' needs throughout their cancer treatment. In the context of a clinical trial designed to deliver a standardized patient navigation protocol, a registry was needed to allow users from across multiple health systems to input patient data, track navigation outreach, and coordinate cancer care in real time. To design a registry to allow patient navigators (PNs) at 6 medical centers across 4 health systems to track breast cancer patients determined to be most at risk for delays in treatment.</p><p><strong>Materials and methods: </strong>A multi-disciplinary team chose REDCap to host the registry. The aim was to develop a platform that would (1) manage a caseload of patients who are most vulnerable for delays; (2) track patients through the continuum of cancer care in real time; (3) allow PNs to prioritize certain patients; (4) facilitate inter-system communication; and (5) allow the research team to monitor navigators' activities (in context of a research study, for supervision and feedback).</p><p><strong>Results: </strong>The registry was built through collaboration with clinical providers, PNs, informatics specialists, and expert developers from the REDCap team, using the software standard features and incorporating additional functionality using SAS programming.</p><p><strong>Conclusion: </strong>REDCap provided an accessible and modifiable platform for hosting a registry to track patients in real time. However, it did not streamline PNs' workflows or reduce data entry burdens as intended. A major barrier was the lack of interoperability with pre-existing systems navigators use, which led to redundancy and increased the burden of documentation.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 5","pages":"ooaf099"},"PeriodicalIF":3.4,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12478474/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145201668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Primary care physicians' experiences with inbox triage. 初级保健医生的收件箱分类经验。
IF 3.4
JAMIA Open Pub Date : 2025-09-26 eCollection Date: 2025-10-01 DOI: 10.1093/jamiaopen/ooaf105
Adam Rule, Rutvi Shah, Christina Dudley, Mark A Micek, Brian G Arndt
{"title":"Primary care physicians' experiences with inbox triage.","authors":"Adam Rule, Rutvi Shah, Christina Dudley, Mark A Micek, Brian G Arndt","doi":"10.1093/jamiaopen/ooaf105","DOIUrl":"10.1093/jamiaopen/ooaf105","url":null,"abstract":"<p><strong>Objective: </strong>Many primary care physicians (PCPs) feel overwhelmed by the number of electronic health record inbox messages they receive. The objective of this study was to characterize PCPs' experiences with inbox triage-the process of reviewing inbox messages and deciding when and how to address them.</p><p><strong>Materials and methods: </strong>We conducted 3 focus groups and 1 individual interview with 9 PCPs at an academic medical center and coded the transcripts for themes related to inbox triage.</p><p><strong>Results: </strong>We identified 5 themes in PCPs' experiences with inbox triage: (1) inbox triage is a continuous process; (2) inbox triage involves different team members performing multiple activities, including identifying messages better addressed through synchronous care, preparing messages to be reviewed by PCPs, and prioritizing messages; (3) PCPs prioritize messages based on multiple factors including clinical urgency, time constraints, and team member involvement; (4) team support for inbox triage varies by clinical experience, team stability, and co-location; and (5) patient expectations and clinic practices help make inbox triage a continuous process, requiring PCPs to establish personal policies to constrain inbox work.</p><p><strong>Discussion: </strong>Designers of clinic workflows, healthcare policy, and health information technology should aim to support the diverse activities involved in inbox triage, message prioritization based on multiple factors, and the collaborative process of establishing and communicating messaging norms.</p><p><strong>Conclusion: </strong>Inbox triage is a collaborative and continuous process requiring PCPs to evaluate multiple aspects of each message, find time to address those messages during busy clinic days, and negotiate different expectations for messaging behavior.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 5","pages":"ooaf105"},"PeriodicalIF":3.4,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12470652/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145187000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
medspacyV: a graphical user interface for the open source medspaCy natural language processing package. medspacyV:开源的medspace自然语言处理包的图形用户界面。
IF 3.4
JAMIA Open Pub Date : 2025-08-23 eCollection Date: 2025-08-01 DOI: 10.1093/jamiaopen/ooaf094
Bharath Velamala, Elham Sagheb Hossein Pour, Michael Lin, Jungwei Wilfred Fan
{"title":"medspacyV: a graphical user interface for the open source medspaCy natural language processing package.","authors":"Bharath Velamala, Elham Sagheb Hossein Pour, Michael Lin, Jungwei Wilfred Fan","doi":"10.1093/jamiaopen/ooaf094","DOIUrl":"10.1093/jamiaopen/ooaf094","url":null,"abstract":"<p><strong>Objectives: </strong>To enable users with modest technical background to perform biomedical natural language processing (NLP).</p><p><strong>Materials and methods: </strong>We developed medspacyV using the Python graphical programming tkinter library, following the model-view-controller (MVC) design pattern. The interface wraps around a rule-based pipeline for sentence splitting, section segmentation, concept identification, and negation detection.</p><p><strong>Results: </strong>The primary window allows the user to configure the project and NLP rules, execute the pipeline, and save the outputs into a table. A separate annotation viewer window can be launched to inspect the immediate or previous NLP outputs.</p><p><strong>Discussion: </strong>We developed medspacyV with three rationales: controllability, explainability, and economy. The rule-based approach is sufficient for many NLP use cases.</p><p><strong>Conclusion: </strong>The medspacyV program is publicly available at https://github.com/medspacy/medspacyV, targeting use by healthcare professionals and researchers in their NLP projects.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 4","pages":"ooaf094"},"PeriodicalIF":3.4,"publicationDate":"2025-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12374723/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144972595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Defining dyadic cancer pain concordance using participant-initiated interactions with a remote health monitoring system. 使用参与者发起的与远程健康监测系统的互动来定义双癌疼痛一致性。
IF 3.4
JAMIA Open Pub Date : 2025-08-22 eCollection Date: 2025-08-01 DOI: 10.1093/jamiaopen/ooaf088
Mina Ostovari, Natalie Crimp, Sarah J Ratcliffe, Virginia LeBaron
{"title":"Defining dyadic cancer pain concordance using participant-initiated interactions with a remote health monitoring system.","authors":"Mina Ostovari, Natalie Crimp, Sarah J Ratcliffe, Virginia LeBaron","doi":"10.1093/jamiaopen/ooaf088","DOIUrl":"10.1093/jamiaopen/ooaf088","url":null,"abstract":"<p><strong>Background: </strong>Studies on symptom concordance between patients and their caregivers often use cross-sectional designs, which may fail to capture the longitudinal, dynamic symptom experience. The Behavioral and Environmental Sensing and Intervention for Cancer (BESI-C) is a remote health monitoring system that utilizes smartwatches and ecological momentary assessments (EMAs) to empower patients and caregivers to monitor and manage cancer pain at home. BESI-C collects real-time symptom data in naturalistic settings, enabling longitudinal tracking and analysis of symptom patterns over time.</p><p><strong>Objective: </strong>To define and examine dyadic concordance using participant-initiated symptom reports collected via remote health monitoring.</p><p><strong>Methods: </strong>Dyads of patients with advanced cancer and their family caregivers were recruited to use BESI-C for 2 weeks, reporting pain in real time through EMAs. We used Bangdiwala's B statistic to determine the concordance of patient-reported pain and caregiver-reported perceived patient pain under different contextual criteria (eg, co-location of participants; user engagement with BESI-C) that we hypothesized would impact concordance. We also explored a hypothesis that concordance would improve between study week 1 versus week 2.</p><p><strong>Results: </strong>Data from 21 patient-caregiver dyads were used for analysis. The reporting of pain events was highly variable between patients and their caregivers. Concordance of pain reporting improved when patients and caregivers were co-located and both wearing their BESI-C smartwatches. We did not observe consistent patterns in patient-caregiver concordance between week 1 and week 2.</p><p><strong>Conclusion: </strong>We propose an analytical approach to define and evaluate concordance between patients' and caregivers' real-time symptom reports that can be applied to dyadic, longitudinal symptom data collected using remote health monitoring. Future work should examine the relationship between patient-caregiver symptom concordance with key quality-of-life metrics and sociodemographic factors that impact participant engagement with remote health monitoring technologies.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 4","pages":"ooaf088"},"PeriodicalIF":3.4,"publicationDate":"2025-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12373113/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144972617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Assessing the advantages and disadvantages of dimensionality reduction methods in summarizing housing determinants of health in the United States. 在总结美国住房健康决定因素时评估降维方法的优缺点。
IF 3.4
JAMIA Open Pub Date : 2025-08-18 eCollection Date: 2025-08-01 DOI: 10.1093/jamiaopen/ooaf093
Xingyu Chen, Christopher Kitchen, Hadi Kharrazi
{"title":"Assessing the advantages and disadvantages of dimensionality reduction methods in summarizing housing determinants of health in the United States.","authors":"Xingyu Chen, Christopher Kitchen, Hadi Kharrazi","doi":"10.1093/jamiaopen/ooaf093","DOIUrl":"10.1093/jamiaopen/ooaf093","url":null,"abstract":"<p><strong>Objectives: </strong>To evaluate and compare different dimensionality reduction techniques for quantifying housing conditions as a social determinant of health (SDOH) across various geographic levels in the United States.</p><p><strong>Materials and methods: </strong>A total of 15 housing characteristics from the American Community Survey data were analyzed at county, ZIP code, and Census tract levels. The robustness of 3 dimensionality reduction techniques was assessed in reducing the 15 housing characteristics into 1 housing score. These summarization methods included principal component analysis (PCA), t-distributed stochastic neighbor embedding (tSNE), and uniform manifold approximation and projection (UMAP). We visualized geographic distributions of the housing scores, assessed methodological discrepancies between the techniques, and analyzed agreement between housing characteristic variability and housing score variability.</p><p><strong>Results: </strong>The selected dimensionality reduction methods generated housing scores that demonstrated acceptable face validity when visualized through choropleth maps. The PCA method provided the most stable and consistent results across geographic levels. The PCA method also resulted in the highest correlation between the variability of the underlying housing characteristics and the summarized housing score.</p><p><strong>Discussion: </strong>Data-driven summarization techniques provide an alternative approach to traditional expert-based indices in capturing housing conditions as a single SDOH factor. In this study, among the different summarized housing scores, the PCA-generated score offered superior robustness, persistent data structure, and higher stability across years.</p><p><strong>Conclusion: </strong>Principal component analysis was identified as the most reliable and interpretable approach for summarizing housing conditions across geographic levels. These findings contribute to the methodological foundation required to develop robust SDOH measures that can inform public health policies and address health disparities.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 4","pages":"ooaf093"},"PeriodicalIF":3.4,"publicationDate":"2025-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12360777/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144883974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Benchmarking of pre-training strategies for electronic health record foundation models. 电子健康记录基础模型预训练策略的基准测试。
IF 3.4
JAMIA Open Pub Date : 2025-08-13 eCollection Date: 2025-08-01 DOI: 10.1093/jamiaopen/ooaf090
Samson Mataraso, Shreya D'Souza, David Seong, Eloïse Berson, Camilo Espinosa, Nima Aghaeepour
{"title":"Benchmarking of pre-training strategies for electronic health record foundation models.","authors":"Samson Mataraso, Shreya D'Souza, David Seong, Eloïse Berson, Camilo Espinosa, Nima Aghaeepour","doi":"10.1093/jamiaopen/ooaf090","DOIUrl":"10.1093/jamiaopen/ooaf090","url":null,"abstract":"<p><strong>Objective: </strong>Our objective is to compare different pre-training strategies for electronic health record (EHR) foundation models.</p><p><strong>Materials and methods: </strong>We evaluated three approaches using a transformer-based architecture: baseline (no pre-training), self-supervised pre-training with masked language modeling, and supervised pre-training. The models were assessed on their ability to predict both major adverse cardiac events and mortality occurring within 12 months. The pre-training cohort was 405 679 patients prescribed antihypertensives and the fine tuning cohort was 5525 patients who received doxorubicin.</p><p><strong>Results: </strong>Task-specific supervised pre-training achieved superior performance (AUROC 0.70, AUPRC 0.23), outperforming both self-supervised pre-training and the baseline. However, when the model was evaluated on the task of 12-month mortality prediction, the self-supervised model performed best.</p><p><strong>Discussion: </strong>While supervised pre-training excels when aligned with downstream tasks, self-supervised approaches offer more generalized utility.</p><p><strong>Conclusion: </strong>Pre-training strategy selection should consider intended applications, data availability, and transferability requirements.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 4","pages":"ooaf090"},"PeriodicalIF":3.4,"publicationDate":"2025-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12349770/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144849275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EchoLLM: extracting echocardiogram entities with light-weight, open-source large language models. EchoLLM:用轻量级、开源的大型语言模型提取超声心动图实体。
IF 3.4
JAMIA Open Pub Date : 2025-08-13 eCollection Date: 2025-08-01 DOI: 10.1093/jamiaopen/ooaf092
Jonathan Chi, Yazan Rouphail, Ethan Hillis, Ningning Ma, An Nguyen, Jane Wang, Mackenzie Hofford, Aditi Gupta, Patrick G Lyons, Adam Wilcox, Albert M Lai, Philip R O Payne, Marin H Kollef, Caitlin Dreisbach, Andrew P Michelson
{"title":"EchoLLM: extracting echocardiogram entities with light-weight, open-source large language models.","authors":"Jonathan Chi, Yazan Rouphail, Ethan Hillis, Ningning Ma, An Nguyen, Jane Wang, Mackenzie Hofford, Aditi Gupta, Patrick G Lyons, Adam Wilcox, Albert M Lai, Philip R O Payne, Marin H Kollef, Caitlin Dreisbach, Andrew P Michelson","doi":"10.1093/jamiaopen/ooaf092","DOIUrl":"10.1093/jamiaopen/ooaf092","url":null,"abstract":"<p><strong>Objectives: </strong>Large language models (LLMs) have demonstrated high levels of performance in clinical information extraction compared to rule-based systems and traditional machine-learning approaches, offering scalability, contextualization, and easier deployment. However, most studies rely on proprietary models with privacy concerns and high costs, limiting accessibility. We aim to evaluate 14 publicly available open-source LLMs for extracting clinically relevant findings from free-text echocardiogram reports and examine the feasibility of their implementation in information extraction workflows.</p><p><strong>Materials and methods: </strong>We used 14 open-source LLM models to extract clinically relevant entities from echocardiogram reports (<i>n</i> = 507). Each report was manually annotated by 2 independent health-care professionals and adjudicated by a third. Lexical variance and length of each echocardiogram report were collected. Precision, recall, and F1 scores were calculated for the 9 extracted entities via multiclass classification.</p><p><strong>Results: </strong>In aggregate, Gemma2:9b-instruct had the highest precision, recall, and F1 scores at 0.973 (0.962-0.983), 0.959 (0.947-0.973), and 0.965 (0.951-0.975), respectively. In comparison, Phi3:3.8b-mini-instruct had the lowest precision score at 0.831 (0.804-0.856), while Gemma:7b-instruct had the lowest recall and F1 scores at 0.382 (0.356-0.408) and 0.392 (0.356-0.428), respectively.</p><p><strong>Discussion and conclusion: </strong>Using LLMs for entity extraction for echocardiogram reports has the potential to support both clinical research and health-care delivery. Our work demonstrates the feasibility of using open-source models for more efficient computation and extraction.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 4","pages":"ooaf092"},"PeriodicalIF":3.4,"publicationDate":"2025-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12349756/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144849276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluating prompt and data perturbation sensitivity in large language models for radiology reports classification. 评估提示和数据扰动敏感性的大语言模型放射学报告分类。
IF 3.4
JAMIA Open Pub Date : 2025-08-12 eCollection Date: 2025-08-01 DOI: 10.1093/jamiaopen/ooaf073
Vera Sorin, Jeremy D Collins, Alex K Bratt, Joanna E Kusmirek, Vamshi K Mugu, Timothy L Kline, Crystal L Butler, Nadia G Wood, Cole J Cook, Panagiotis Korfiatis
{"title":"Evaluating prompt and data perturbation sensitivity in large language models for radiology reports classification.","authors":"Vera Sorin, Jeremy D Collins, Alex K Bratt, Joanna E Kusmirek, Vamshi K Mugu, Timothy L Kline, Crystal L Butler, Nadia G Wood, Cole J Cook, Panagiotis Korfiatis","doi":"10.1093/jamiaopen/ooaf073","DOIUrl":"10.1093/jamiaopen/ooaf073","url":null,"abstract":"<p><strong>Objectives: </strong>Large language models (LLMs) offer potential in natural language processing tasks in healthcare. Due to the need for high accuracy, understanding their limitations is essential. The purpose of this study was to evaluate the performance of LLMs in classifying radiology reports for the presence of pulmonary embolism (PE) under various conditions, including different prompt designs and data perturbations.</p><p><strong>Materials and methods: </strong>In this retrospective, institutional review board approved study, we evaluated 3 Google's LLMs including Gemini-1.5-Pro, Gemini-1.5-Flash-001, and Gemini-1.5-Flash-002, in classifying 11 999 pulmonary CT angiography radiology reports for PE. Ground truth labels were determined by concordance between a computer vision-based PE detection (CVPED) algorithm and multiple LLM runs under various configurations. Discrepancies between algorithms' classifications were aggregated and manually reviewed. We evaluated the effects of prompt design, data perturbations, and repeated analyses across geographic cloud regions. Performance metrics were calculated.</p><p><strong>Results: </strong>Of 11 999 reports, 1296 (10.8%) were PE-positive. Accuracy across LLMs ranged between 0.953 and 0.996. The highest recall rate for a prompt modified after a review of the misclassified cases (up to 0.997). Few-shot prompting improved recall (up to 0.99), while chain-of-thought generally degraded performance. Gemini-1.5-Flash-002 demonstrated the highest robustness against data perturbations. Geographic cloud region variability was minimal for Gemini-1.5+-Pro, while the Flash models showed stable performance.</p><p><strong>Discussion and conclusion: </strong>LLMs demonstrated high performance in classifying radiology reports, though results varied with prompt design and data quality. These findings underscore the need for systematic evaluation and validation of LLMs for clinical applications, particularly in high-stakes scenarios.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 4","pages":"ooaf073"},"PeriodicalIF":3.4,"publicationDate":"2025-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12343119/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144838087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Ensemble learning to enhance accurate identification of patients with glaucoma using electronic health records. 集成学习提高青光眼患者使用电子健康记录的准确识别。
IF 3.4
JAMIA Open Pub Date : 2025-08-10 eCollection Date: 2025-08-01 DOI: 10.1093/jamiaopen/ooaf080
Tushar Mungle, Behzad Naderalvojoud, Chris A Andrews, Hong Su An, Amanda Bicket, Amy Zhang, Julie Rosenthal, Wen-Shin Lee, Chase A Ludwig, Bethlehem Mekonnen, Suzann Pershing, Joshua D Stein, Tina Hernandez-Boussard
{"title":"Ensemble learning to enhance accurate identification of patients with glaucoma using electronic health records.","authors":"Tushar Mungle, Behzad Naderalvojoud, Chris A Andrews, Hong Su An, Amanda Bicket, Amy Zhang, Julie Rosenthal, Wen-Shin Lee, Chase A Ludwig, Bethlehem Mekonnen, Suzann Pershing, Joshua D Stein, Tina Hernandez-Boussard","doi":"10.1093/jamiaopen/ooaf080","DOIUrl":"10.1093/jamiaopen/ooaf080","url":null,"abstract":"<p><strong>Objectives: </strong>Existing ophthalmology studies for clinical phenotypes identification in real-world datasets (RWD) rely exclusively on structured data elements (SDE). We evaluated the performance, generalizability, and fairness of multimodal ensemble models that integrate real-world SDE and free-text data compared to SDE-only models to identify patients with glaucoma.</p><p><strong>Materials and methods: </strong>This is a retrospective cross-sectional study involving 2 health systems- University of Michigan (UoM) and Stanford University (SU). It involves 1728 patients visiting eye clinics during 2012-2021. Free-text embeddings extracted using BioClinicalBERT were combined with SDE. EditedNearestNeighbor (ENN) undersampling and Borderline-Synthetic Minority Over-sampling Technique (bSMOTE) addressed class imbalance. Lasso Regression (LR), Random Forest (RF), Support Vector Classifier (SVC) models were trained on UoM imbalanced (imb) and resampled data along with bagging ensemble method. Models were externally validated with SU data. Fairness was assessed using equalized odds difference (EOD) and Target Probability Difference (TPD).</p><p><strong>Results: </strong>Among 900 and 828 patients from UoM and SU, 10% and 23% respectively had glaucoma as confirmed by ophthalmologists. At UoM, multimodal LR<sub>imb</sub> (F1 = 76.60 [61.90-88.89]; AUROC = 95.41 [87.01-99.63]) outperformed unimodal RF<sub>imb</sub> (F1 = 69.77 [52.94-83.64]; AUROC = 97.72 [95.95-99.18]) and ICD-coding method (F1 = 53.01 [39.51-65.43]; AUROC = 90.10 [84.59-93.93]). Bagging (BM = LR<sub>ENN</sub> + LR<sub>bSMOTE</sub>) improved performance achieving an F1 of 83.02 [70.59-92.86] and AUROC of 97.59 [92.98-99.88]. During external validation BM achieved the highest F1 (68.47 [62.61-73.75]), outperforming unimodal (F1 = 51.26 [43.80-58.13]) and multimodal LR<sub>imb</sub> (F1 = 62.46 [55.95-68.24]). BM EOD revealed lower disparities for sex (<0.1), race (<0.5) and ethnicity (<0.5), and had least uncertainty using TDP analysis as compared to traditional models.</p><p><strong>Discussion: </strong>Multimodal ensemble models integrating structured and unstructured EHR data outperformed traditional SDE models achieving fair predictions across demographic sub-groups. Among ensemble methods, bagging demonstrated better generalizability than stacking, particularly when training data is limited.</p><p><strong>Conclusion: </strong>This approach can enhance phenotype discovery to enable future research studies using RWD, leading to better patient management and clinical outcomes.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 4","pages":"ooaf080"},"PeriodicalIF":3.4,"publicationDate":"2025-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12342940/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144838086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信