Journal of the American Medical Informatics Association最新文献

筛选
英文 中文
The detectability paradox: bilingual medical report generation with open-weight models and the limits of human oversight. 可检测性悖论:开放权重模型的双语医学报告生成和人类监督的局限性。
IF 4.6 2区 医学
Journal of the American Medical Informatics Association Pub Date : 2026-05-08 DOI: 10.1093/jamia/ocag070
Hossein Rouhizadeh, Abiram Sandralegar, Anthony Yazdani, Weibo Feng, Oren Schreier, Yonnou Ahn-Kim, Assiya Sirbal, Valentino Pirelli, Rui Yang, Lukas Sveikata, Elena Tessitore, Nan Liu, Philippe Bijlenga, Douglas Teodoro
{"title":"The detectability paradox: bilingual medical report generation with open-weight models and the limits of human oversight.","authors":"Hossein Rouhizadeh, Abiram Sandralegar, Anthony Yazdani, Weibo Feng, Oren Schreier, Yonnou Ahn-Kim, Assiya Sirbal, Valentino Pirelli, Rui Yang, Lukas Sveikata, Elena Tessitore, Nan Liu, Philippe Bijlenga, Douglas Teodoro","doi":"10.1093/jamia/ocag070","DOIUrl":"https://doi.org/10.1093/jamia/ocag070","url":null,"abstract":"<p><strong>Objectives: </strong>The automation of medical report generation using large language models (LLMs) could significantly reduce physicians' documentation burden while enhancing healthcare efficiency. However, the misuse of generative artificial intelligence in medical reporting can lead to important safety risks for patients. We addressed 2 questions: (1) What is the quality of medical reports generated by LLMs in English and French? and (2) Can we distinguish between human-written and LLM-generated medical reports?</p><p><strong>Materials and methods: </strong>We evaluated the quality of reports generated by several multilingual, open-weight LLMs using text similarity metrics on 4212 medical reports in English and French across multiple specialties. A bilingual expert panel of certified physicians (n = 4) and medical residents (n = 5) scored accuracy, fluency, and completeness of generated reports using a 1-5 Likert scale. Experts also completed a Turing-like test, blindly identifying reports as human or machine-generated.</p><p><strong>Results: </strong>Phi-4 achieved the best overall performance (ROUGE-1: 0.70, BERTScore: 0.83). Expert evaluation confirmed high-quality reports in both languages (overall 4.6/5.0). Medical experts performed better than chance but struggled to differentiate human versus machine reports (accuracy: 0.60). Automatic classifiers showed strong performance (accuracy: 0.98).</p><p><strong>Discussion: </strong>The high quality of LLM-generated reports supports their potential to enhance healthcare efficiency in multilingual settings. However, the discrepancy between human detection difficulty and automated detection success reveals inherent limitations in relying solely on human oversight for quality assurance and misuse prevention.</p><p><strong>Conclusions: </strong>Deployment of LLMs for medical reporting requires combining automated detection tools with human expertise to ensure patient safety. Dataset and code: https://github.com/ds4dh/medical_report_generation.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2026-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147845507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing validation of case-control omics signatures through "minimalist" single-subject analysis (N-of-1 trials): proof of concept in sepsis. 通过“极简”单受试者分析(N-of-1试验)加强病例对照组学特征的验证:败血症的概念证明。
IF 4.6 2区 医学
Journal of the American Medical Informatics Association Pub Date : 2026-05-07 DOI: 10.1093/jamia/ocag061
Liam S Wilson, Nima Pouladi, Rachel F Nelson, Elizabeth A Middleton, Neal D Tolley, Mahdieh Shabanian, Colleen Kenost, Robert A Campbell, Matthew T Rondina, Yves A Lussier
{"title":"Enhancing validation of case-control omics signatures through \"minimalist\" single-subject analysis (N-of-1 trials): proof of concept in sepsis.","authors":"Liam S Wilson, Nima Pouladi, Rachel F Nelson, Elizabeth A Middleton, Neal D Tolley, Mahdieh Shabanian, Colleen Kenost, Robert A Campbell, Matthew T Rondina, Yves A Lussier","doi":"10.1093/jamia/ocag061","DOIUrl":"https://doi.org/10.1093/jamia/ocag061","url":null,"abstract":"<p><strong>Objective: </strong>To evaluate if a single-subject study (S3) design, utilizing paired transcriptome samples from the same patient (eg, \"sepsis\" vs \"recovered\"), can replicate transcriptomic signatures from small case-control studies, addressing challenges in patient accrual for rare or sub-stratified diseases.</p><p><strong>Methods: </strong>We generated a sepsis gene signature (SGS) comprising 300 differentially expressed genes (DEGs; FDR < 5%) from a human sepsis case-control cohort using general linear models (GLMs). Reproducibility of SGS was assessed through three approaches applied to sub-sampled independent datasets: single-subject analyses (N-of-1-MixEnrich), anticipated to perform better; conventional paired-sample GLM analyses; and a traditional case-control GLM analysis.</p><p><strong>Results: </strong>SGS reproducibility in GLM analyses was inconsistent at smaller cohort sizes (∼80% reproducibility; n = 5) but stabilized at cohort sizes >6. Remarkably, the single-subject-study approach consistently reproduced SGS in each of the 18 subjects individually (100% reproducibility; n = 1).</p><p><strong>Discussion: </strong>Conventional GLMs are not designed for single-subject or small cohort analyses due to their dependence on larger samples to mitigate variable dispersion and human heterogeneity. In contrast, S3 methods enhance statistical power by: reducing multiple testing through gene set aggregation, emphasizing concordant changes in pathway activity rather than exact molecular consistency, and exploiting paired samples from the same individual.</p><p><strong>Conclusion: </strong>This proof-of-concept demonstrates that S3 designs effectively validate gene expression signatures derived from case-control studies, highlighting their potential in research or clinical trials constrained by small sample sizes. However, further validation and computational simulation are needed to demonstrate scalability to other conditions and sensitivity to validation subject variations from the \"average subject\" of discovery cohorts.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2026-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147845485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hybrid care engagement phenotypes and glycemic outcomes in diabetes: a cluster analysis across two health systems. 糖尿病的混合护理参与表型和血糖结局:跨两个卫生系统的聚类分析。
IF 4.6 2区 医学
Journal of the American Medical Informatics Association Pub Date : 2026-05-06 DOI: 10.1093/jamia/ocag063
Namuun Clifford, Kathryn E Kemper-McIsaac, Haoxiang Yu, Taylor Rapson, Urmimala Sarkar, Elaine C Khoong
{"title":"Hybrid care engagement phenotypes and glycemic outcomes in diabetes: a cluster analysis across two health systems.","authors":"Namuun Clifford, Kathryn E Kemper-McIsaac, Haoxiang Yu, Taylor Rapson, Urmimala Sarkar, Elaine C Khoong","doi":"10.1093/jamia/ocag063","DOIUrl":"https://doi.org/10.1093/jamia/ocag063","url":null,"abstract":"<p><strong>Objective: </strong>Prior studies often examine single telehealth encounter types or aggregate all digital care, overlooking how patients combine multiple digital and in-person modalities in hybrid care. To address this gap, we derived hybrid care engagement phenotypes and assessed sociodemographic differences and associations with glycemic control among adults with type 2 diabetes (T2DM).</p><p><strong>Methods: </strong>We conducted a retrospective cohort study of 10 671 adults with T2DM receiving primary care at an academic (UCSF) or safety-net system (SFHN) from April 2021 to March 2023. K-medoids clustering was applied to five encounter modalities (in-person, video, telephone visits; portal messages; unscheduled telephone calls) to derive four engagement phenotypes. We assessed sociodemographic differences using chi-square and Kruskal-Wallis tests and evaluated associations between phenotype and follow-up HbA1c control using logistic regression. We tested interactions with baseline HbA1c and estimated predicted probabilities using Tukey-adjusted contrasts.</p><p><strong>Results: </strong>Four phenotypes emerged per system: Digitally Engaged Multimodal, Traditional High Utilizers, Digitally Leaning (UCSF), Telephone Reliant (SFHN), and Low Digital. UCSF patients belonged to digitally forward phenotypes, whereas SFHN patients concentrated in traditional, lower-tech phenotypes. Among patients with uncontrolled diabetes, digitally forward phenotypes had 13-20 percentage points higher predicted probability of achieving control (UCSF: 56% Digitally Leaning vs 36% Traditional; SFHN: 53% Multimodal vs 40% Telephone).</p><p><strong>Discussion: </strong>Phenotypes varied by health system and sociodemographic factors, with modest, system-specific associations between digitally forward phenotypes and glycemic control among patients with uncontrolled diabetes. Findings underscore structural and sociodemographic inequities in hybrid care engagement and the need for proactive, tailored strategies to promote equitable hybrid care.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2026-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147845448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Disparate language and model effects on AI-based translation and recognition of genetic conditions. 不同语言和模型对基于人工智能的遗传条件翻译和识别的影响。
IF 4.6 2区 医学
Journal of the American Medical Informatics Association Pub Date : 2026-05-06 DOI: 10.1093/jamia/ocag067
Dat Duong, Irini Manoli, Shubha R Phadke, Chanika Phornphutkul, Jonathan D Raymond, Benjamin D Solomon
{"title":"Disparate language and model effects on AI-based translation and recognition of genetic conditions.","authors":"Dat Duong, Irini Manoli, Shubha R Phadke, Chanika Phornphutkul, Jonathan D Raymond, Benjamin D Solomon","doi":"10.1093/jamia/ocag067","DOIUrl":"https://doi.org/10.1093/jamia/ocag067","url":null,"abstract":"<p><strong>Introduction: </strong>Artificial intelligence (AI) is increasingly prevalent. Patients and clinicians may use AI-based tools in many different languages.</p><p><strong>Objective: </strong>To investigate AI translation tools for descriptions of genetic conditions and how AI identification of genetic conditions is affected by translations.</p><p><strong>Materials and methods: </strong>We used Neural machine translation (NMT) and large language-model (LLM) translation to translate descriptions of 40 genetic conditions into 191 and 93 languages, respectively. Excluding translations retaining English medical terms verbatim, we respectively focused on 139 and 70 languages. After assessing translations, we assessed the ability of 3 proprietary and 3 open-weight general LLMs to identify conditions in the translations. We analyzed how accuracy was affected by the conditions' prevalence in the literature, and attributes of the languages (the script, language family, and prevalence of the language in training sources). We also investigated adaptive translation for select languages.</p><p><strong>Results: </strong>We found significant differences in condition identification based on the translation method, condition, language, and prediction model. The accuracy of some models was more affected than others by factors like the conditions' literature prevalence, language script, family, and language prevalence. Adaptive translation for select languages did not improve translations or diagnostic accuracy with the 3 tested LLMs. However, further analysis with 1 language showed that this approach was more effective with smaller LLMs.</p><p><strong>Conclusions: </strong>AI-based translation has variable performance, which can affect the ability of AI models to recognize genetic conditions. These findings should inform safe medical AI use to support consistent performance in different languages.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2026-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147845465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TRUST: an large language model-based dialogue system for trauma understanding and structured assessments. 信任:一个用于创伤理解和结构化评估的基于语言模型的大型对话系统。
IF 4.6 2区 医学
Journal of the American Medical Informatics Association Pub Date : 2026-05-06 DOI: 10.1093/jamia/ocag050
Sichang Tu, Abigail Powers, Stephen Doogan, Jinho D Choi
{"title":"TRUST: an large language model-based dialogue system for trauma understanding and structured assessments.","authors":"Sichang Tu, Abigail Powers, Stephen Doogan, Jinho D Choi","doi":"10.1093/jamia/ocag050","DOIUrl":"https://doi.org/10.1093/jamia/ocag050","url":null,"abstract":"<p><strong>Objectives: </strong>While large language models (LLMs) have been widely used to assist clinicians and support patients, no existing work has explored dialogue systems for standard diagnostic interviews and assessments. This study aims to bridge the gap in mental healthcare accessibility by developing an LLM-powered dialogue system that replicates clinician behavior.</p><p><strong>Materials and methods: </strong>We introduce TRUST, a framework of cooperative LLM modules capable of conducting formal diagnostic interviews and assessments for post-traumatic stress disorder (PTSD) following the Clinician-Administered PTSD Scale for DSM-5 (CAPS-5). To guide the generation of appropriate clinical responses, we propose a Dialogue Acts schema specifically designed for clinical interviews. Additionally, we develop a patient simulation approach based on real-life interview transcripts to replace time-consuming and costly manual testing by clinicians.</p><p><strong>Results: </strong>A comprehensive set of evaluation metrics is designed to assess the dialogue system from both the agent and patient simulation perspectives. Expert evaluations by conversation and clinical specialists show that TRUST performs comparably to real-life clinical interviews.</p><p><strong>Discussion: </strong>Our system performs with clinical quality approaching that of human clinicians, with room for future enhancements in communication styles and response appropriateness.</p><p><strong>Conclusions: </strong>Our TRUST framework shows its potential to facilitate mental healthcare availability.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2026-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147845531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Engineering biomarker representations of vital signs data enhances deep learning mortality prediction. 生命体征数据的工程生物标志物表示增强了深度学习死亡率预测。
IF 4.6 2区 医学
Journal of the American Medical Informatics Association Pub Date : 2026-05-02 DOI: 10.1093/jamia/ocag066
Behrooz Mamandipoor, Isabella Shen, Chun-Nan Hsu, Rodney A Gabriel
{"title":"Engineering biomarker representations of vital signs data enhances deep learning mortality prediction.","authors":"Behrooz Mamandipoor, Isabella Shen, Chun-Nan Hsu, Rodney A Gabriel","doi":"10.1093/jamia/ocag066","DOIUrl":"10.1093/jamia/ocag066","url":null,"abstract":"<p><strong>Objectives: </strong>We evaluated bidirectional long short-term memory models for predicting inpatient mortality using different approaches to processing vital signs data collected during the initial 24 h of intensive care unit (ICU) admissions.</p><p><strong>Materials and methods: </strong>We compared 3 vital-sign representations: (1) raw data recorded every 5 min, (2) preprocessed data averaged hourly, and (3) preprocessed data using biomarker representations that extends a digital oximetry biomarker toolbox of PhysioZoo software, applied to blood pressure, heart rate, temperature, respiratory rate, and SpO2.</p><p><strong>Results: </strong>Across 2 large ICU datasets, HiRID and eICU, models trained on the frequency-normalized representation achieved higher discrimination and lower Brier scores than those trained on raw 5-min and hourly averaged data.</p><p><strong>Discussion: </strong>The use of biomarker representations of vital signs yielded the largest improvements in discrimination and overall probabilistic performance reflected by lower Brier scores for predicting inpatient mortality by deep learning.</p><p><strong>Conclusion: </strong>Thus, we recommend using a similar approach to vital signs preprocessing for time-series predictive models.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2026-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147823047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimising pain identification in resource-limited emergency departments using transfer learning and fine-tuned language models. 在资源有限的急诊科使用迁移学习和微调语言模型优化疼痛识别。
IF 4.6 2区 医学
Journal of the American Medical Informatics Association Pub Date : 2026-05-01 DOI: 10.1093/jamia/ocag054
Yutong Wu, James A Hughes, Chantelle Judge, Casey Appo, Anthony Nguyen
{"title":"Optimising pain identification in resource-limited emergency departments using transfer learning and fine-tuned language models.","authors":"Yutong Wu, James A Hughes, Chantelle Judge, Casey Appo, Anthony Nguyen","doi":"10.1093/jamia/ocag054","DOIUrl":"https://doi.org/10.1093/jamia/ocag054","url":null,"abstract":"<p><strong>Objective: </strong>To optimise the identification of patients presenting with pain in emergency department (ED) settings with limited resources using multiple transfer learning techniques.</p><p><strong>Methods: </strong>Two strategies were explored: (1) fine-tuning a pre-trained language model, previously fine-tuned on data from a well-resourced ED, using labelled data from a target ED, and (2) continual pre-training using task-specific unlabelled data to enhance clinical text classification.</p><p><strong>Results: </strong>With 2000 labelled samples from a target ED, the combined strategies achieved an F1-score of 92%, demonstrating significant benefits of transfer learning in resource-constrained settings.</p><p><strong>Discussion: </strong>Accurately identifying pain in patients upon arrival to the ED is crucial for timely and effective treatment. Findings suggest that combining both transfer learning strategies can significantly enhance pain identification performances in resource-constrained settings.</p><p><strong>Conclusion: </strong>Combining fine-tuning on labelled data and continual pre-training on unlabelled data has potential to optimise model performance in both resource-constrained and well-resourced settings, highlighting the broader applicability and potential of these techniques for improving clinical text classification.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2026-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147823082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fairness aware subset selection for advancing equity in skin cancer detection. 公平性感知子集选择促进皮肤癌检测公平性。
IF 4.6 2区 医学
Journal of the American Medical Informatics Association Pub Date : 2026-05-01 DOI: 10.1093/jamia/ocag028
Yehuda Perry, Abdulaziz A Almuzaini, Adewole S Adamson, Bahar Dasgeb, David J Foran, Vivek K Singh
{"title":"Fairness aware subset selection for advancing equity in skin cancer detection.","authors":"Yehuda Perry, Abdulaziz A Almuzaini, Adewole S Adamson, Bahar Dasgeb, David J Foran, Vivek K Singh","doi":"10.1093/jamia/ocag028","DOIUrl":"10.1093/jamia/ocag028","url":null,"abstract":"<p><strong>Objectives: </strong>Skin cancer is the most common malignancy in the United States, with more than five million cases diagnosed annually among 3.3 million individuals. Melanoma, the deadliest form of skin cancer, accounts for roughly 200 000 new diagnoses each year and nearly 10 000 deaths. AI-based skin cancer detection is being developed and tested in laboratory and academic settings as a promising approach to improve access and reduce disparities. However, current models often underperform on darker skin tones (Fitzpatrick Types V and VI), creating fairness concerns that must be addressed prior to clinical deployment. Existing fairness-aware methods focus on algorithmic adjustments while neglecting data quality and representation. We introduce FAIR-SCAN (Fairness and Accuracy through Ranking-Based Subset Selection for Skin Cancer Detection), a data-centric framework that enhances fairness through subset selection guided by marginal contribution score (MCS) estimation.</p><p><strong>Materials and methods: </strong>FAIR-SCAN ranks data points by their contribution to both accuracy and fairness, then selects an optimal subset for training. We evaluated its effectiveness using images from Diverse Dermatology Images (DDI) and Fitzpatrick 17K.</p><p><strong>Results: </strong>FAIR-SCAN improved balance in accuracy, True Positive Rate, and False Positive Rate across skin tones while reducing the training dataset by 50%, outperforming algorithm-focused fairness methods.</p><p><strong>Discussion: </strong>These findings highlight the importance of strategic data selection in mitigating bias in AI-driven diagnostics. FAIR-SCAN's data-centric approach enhances both precision and equity in skin cancer detection.</p><p><strong>Conclusion: </strong>Strategic data selection is critical for equitable AI-driven diagnostics. FAIR-SCAN advances fairness and accuracy in skin cancer detection, supporting development of trustworthy clinical AI systems.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"1009-1017"},"PeriodicalIF":4.6,"publicationDate":"2026-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13127651/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147391542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Interactive active learning for literature screening: finetuning GPT with DeepSeek reasoning for cross-domain generalization. 用于文献筛选的交互式主动学习:使用DeepSeek推理对GPT进行微调以进行跨域泛化。
IF 4.6 2区 医学
Journal of the American Medical Informatics Association Pub Date : 2026-05-01 DOI: 10.1093/jamia/ocag014
Yiming Li, Joseph M Plasek, Xinsong Du, Yifei Wang, Zhengyang Zhou, John Lian, Ya-Wen Chuang, Pengyu Hong, Peter C Hou, Li Zhou
{"title":"Interactive active learning for literature screening: finetuning GPT with DeepSeek reasoning for cross-domain generalization.","authors":"Yiming Li, Joseph M Plasek, Xinsong Du, Yifei Wang, Zhengyang Zhou, John Lian, Ya-Wen Chuang, Pengyu Hong, Peter C Hou, Li Zhou","doi":"10.1093/jamia/ocag014","DOIUrl":"10.1093/jamia/ocag014","url":null,"abstract":"<p><strong>Objective: </strong>Automated literature screening in biomedical research is often hindered by domain shifts and scarcity of labeled data, which limit model accuracy and generalizability. While large language models (LLMs) perform well in zero-shot settings, they often fail to capture complex, domain-specific reasoning patterns. To address this limitation, this study investigates whether an interactive, weakly supervised learning framework combining GPT (generative pre-trained transformer)'s fine-tuning adaptability with DeepSeek's reasoning capabilities can improve literature screening performance across biomedical domains.</p><p><strong>Materials and methods: </strong>We developed an active learning framework that leverages model disagreement between GPT-4o and DeepSeek to improve literature screening performance. This process began with a labeled corpus of 6331 articles on large language models, from which a model disagreement analysis was performed to identify cases where GPT-4o misclassified and DeepSeek produced correct predictions. Three GPT variants-GPT-4o, GPT-4o-mini, and GPT-4.1-nano, were fine-tuned under standard supervised learning settings using these disagreement-based samples. Fine-tuning prompts incorporated classification labels and, when available, rationale traces generated by DeepSeek to provide reasoning-augmented weak supervision. Model performance was evaluated on an independent benchmark set of 291 annotated articles across 10 topic queries in cancer immunotherapy and LLMs in medicine, using standard evaluation metrics, with recall as the primary measure.</p><p><strong>Results: </strong>Fine-tuning GPT models using disagreement-based examples significantly improved performance. GPT-4o-mini achieved the best overall results after fine-tuning, especially with the highest F1 score (0.93, P < .001) and recall (0.95, P < .001). Across the biomedical topics, fine-tuned models consistently outperformed their zero-shot counterparts without increasing reviewer workload.</p><p><strong>Discussion: </strong>These findings demonstrate the effectiveness of disagreement-driven active learning in enhancing GPT-based biomedical literature screening. Lightweight models like GPT-4o-mini benefit most from targeted, reasoning-enriched training, highlighting their suitability for scalable deployment.</p><p><strong>Conclusion: </strong>This study introduces an interactive active learning framework that leverages fine-tuned LLMs with reasoning capabilities to enhance literature screening. The approach offers a scalable solution to more efficient and reliable information retrieval in systematic reviews.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"1026-1036"},"PeriodicalIF":4.6,"publicationDate":"2026-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13127649/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147391540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Current methods for analyzing time-series patient-generated health data to assess treatment response: a scoping review. 分析时间序列患者产生的健康数据以评估治疗反应的当前方法:范围审查。
IF 4.6 2区 医学
Journal of the American Medical Informatics Association Pub Date : 2026-05-01 DOI: 10.1093/jamia/ocag027
Michelo Banda, Sian Bladon, Mariam Al-Attar, Roberto Cahuantzi, David A Jenkins, William G Dixon, Sabine N van der Veer
{"title":"Current methods for analyzing time-series patient-generated health data to assess treatment response: a scoping review.","authors":"Michelo Banda, Sian Bladon, Mariam Al-Attar, Roberto Cahuantzi, David A Jenkins, William G Dixon, Sabine N van der Veer","doi":"10.1093/jamia/ocag027","DOIUrl":"10.1093/jamia/ocag027","url":null,"abstract":"<p><strong>Objectives: </strong>We aimed to identify and map recent studies using high-frequency, time-series electronic patient-generated health data (ePGHD) to assess treatment response; characterize ePGHD types and collection methods; summarize ePGHD-based definitions of treatment response; and describe analytical approaches used.</p><p><strong>Materials and methods: </strong>We systematically searched 4 databases for articles published between January 2022 and June 2024, supplemented by a forward citation search until June 2025. Peer-reviewed studies were eligible if ePGHD were collected outside clinical settings, and either reported at least weekly (ie, if actively reported by participants) or summarized discretely (eg, daily) if passively collected via wearables/sensors. We screened articles for eligibility independently in duplicate and synthesized extracted data descriptively.</p><p><strong>Results: </strong>Our search yielded 4030 articles, of which we included 186. Most studies collected ePGHD using mobile applications or webforms (n = 133) over 4-12 weeks (n = 67). Prior to analysis, 132 studies excluded portions or condensed ePGHD into one or more summaries. Among 172 studies estimating treatment response, 98 applied longitudinal methods (eg, mixed-effects models) that accounted for repeated measures while capturing within- and between-subject variations, whereas 74 used cross-sectional approaches. Of 18 prediction modeling studies, 16 employed machine learning techniques, with only 4 explicitly modeling repeated measures. Five studies identified clusters of response trajectories generally without incorporating temporal dependencies (eg, using K-means).</p><p><strong>Discussion and conclusion: </strong>Many studies in this review did not fully leverage the high-frequency, longitudinal nature of ePGHD. Future research should adopt more appropriate and readily available analytic methods to maximize the potential of time-series ePGHD for generating insights into treatment response.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"1065-1076"},"PeriodicalIF":4.6,"publicationDate":"2026-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13127660/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147437027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信
小红书