Journal of the American Medical Informatics Association最新文献_第2页

A DeepSeek-powered locally deployed closed-loop system for enhancing quality control in electronic nursing documentation: development and clinical validation. 一个由deepseek驱动的本地部署闭环系统，用于加强电子护理文档的质量控制：开发和临床验证。

IF 4.7 2区医学

Journal of the American Medical Informatics Association Pub Date : 2025-07-16 DOI: 10.1093/jamia/ocaf109

Jinhong Lv, Yangyang Xu, Mengzhu Jiang, Yuanhao Lv, Jialu Sun, Jinming Lu, Lina Wang, Hongru Wang

{"title":"A DeepSeek-powered locally deployed closed-loop system for enhancing quality control in electronic nursing documentation: development and clinical validation.","authors":"Jinhong Lv, Yangyang Xu, Mengzhu Jiang, Yuanhao Lv, Jialu Sun, Jinming Lu, Lina Wang, Hongru Wang","doi":"10.1093/jamia/ocaf109","DOIUrl":"https://doi.org/10.1093/jamia/ocaf109","url":null,"abstract":"Objectives: To develop a locally deployed DeepSeek-powered closed-loop system for electronic nursing documentation quality control (QC) and evaluate its clinical efficacy through a multidimensional validation framework.Materials and methods: We implemented a three-dimensional (3D) QC framework (real-time, final, and vertical QC). A retrospective analysis of 556 electronic nursing records was conducted to evaluate pre- and postimplementation outcomes, with documentation accuracy and audit efficiency assessed via blinded nurse evaluations.Results: After implementation, omission rates decreased from 7.19% to 1.79%, the prevalence of logical inconsistencies decreased from 9.35% to 0.72%, and the prevalence of timeliness errors decreased from 8.63% to 0%. The QC time per record decreased by 3.2-fold. Nurse satisfaction was evaluated using the Clinical Nursing Information System Effectiveness Evaluation Scale (Zhao Y, Gu Y, Zhang X, et al. Developed the clinical nursing information system effectiveness evaluation scale based on the new D&M model and conducted reliability and validity evaluation. Chin J Prae Nurs. 2020;36:544-550. https://doi.org/10.3760/cma.j.issn.1672-7088.2020.07.013), yielding a total score of 102.73 ± 3.25 out of a maximum 115 points.Discussion: This study demonstrates that the Artificial Intelligence (AI)-powered closed-loop QC system significantly enhances documentation accuracy and workflow efficiency while ensuring data security. The 3D framework (real-time, final, and vertical QC) represents a paradigm shift from reactive to proactive quality governance in nursing practice. High nurse satisfaction (102.73/115) confirms clinical viability, offering a scalable model for intelligent health-care quality ecosystems. Future work should explore federated learning for multicenter deployment and regulatory frameworks for clinical AI.Conclusion: DeepSeek demonstrated robust efficacy in enhancing QC accuracy and workflow efficiency, with localized deployment ensuring data security. This system redefines nursing documentation management, heralding an era of \"intelligent negative feedback\" in health-care quality ecosystems.","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2025-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144651086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Diagnostic accuracy differences in detecting wound maceration between humans and artificial intelligence: the role of human expertise revisited. 人类和人工智能在伤口浸渍检测中的诊断准确性差异：重新审视人类专业知识的作用。

IF 4.7 2区医学

Journal of the American Medical Informatics Association Pub Date : 2025-07-16 DOI: 10.1093/jamia/ocaf116

Florian Kücking, Ursula H Hübner, Dorothee Busch

{"title":"Diagnostic accuracy differences in detecting wound maceration between humans and artificial intelligence: the role of human expertise revisited.","authors":"Florian Kücking, Ursula H Hübner, Dorothee Busch","doi":"10.1093/jamia/ocaf116","DOIUrl":"https://doi.org/10.1093/jamia/ocaf116","url":null,"abstract":"Objective: This study aims to compare the diagnostic abilities of humans in wound image assessment with those of an AI-based model, examine how \"expertise\" affects clinicians' diagnostic performance, and investigate the heterogeneity in clinical judgments.Materials and methods: A total of 481 healthcare professionals completed a diagnostic task involving 30 chronic wound images with and without maceration. A convolutional neural network (CNN) classification model performed the same task. To predict human accuracy, participants' \"expertise,\" ie, pertinent formal qualification, work experience, self-confidence, and wound focus, was analyzed in a regression analysis. Human interrater reliability was calculated.Results: Human participants achieved an average accuracy of 79.3% and a maximum accuracy of 85% in the formally qualified group. Achieving 90% accuracy, the CNN performed better but not significantly. Pertinent formal qualification (β = 0.083, P < .001) and diagnostic self-confidence (β = 0.015, P = .002) significantly predicted human accuracy, while work experience and focus on wound care had no effect (R2 = 24.3%). Overall interrater reliability was \"fair\" (Kappa = 0.391).Discussion: Among the \"expertise\"-related factors, only the qualification and self-confidence variables influenced diagnostic accuracy. These findings challenge previous assumptions about work experience or job titles defining \"expertise\" and influencing human diagnostic performance.Conclusion: This study offers guidance to future studies when comparing human expert and AI task performance. However, to explain human diagnostic accuracy, \"expertise\" may only serve as one correlate, while additional factors need further research.","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2025-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144651087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Human-centered explainability evaluation in clinical decision-making: a critical review of the literature. 临床决策中以人为中心的可解释性评价：文献综述。

IF 4.7 2区医学

Journal of the American Medical Informatics Association Pub Date : 2025-07-14 DOI: 10.1093/jamia/ocaf110

Jenny M Bauer, Martin Michalowski

{"title":"Human-centered explainability evaluation in clinical decision-making: a critical review of the literature.","authors":"Jenny M Bauer, Martin Michalowski","doi":"10.1093/jamia/ocaf110","DOIUrl":"https://doi.org/10.1093/jamia/ocaf110","url":null,"abstract":"Objectives: This review paper comprehensively summarizes healthcare provider (HCP) evaluation of explanations produced by explainable artificial intelligence methods to support point-of-care, patient-specific, clinical decision-making (CDM) within medical settings. It highlights the critical need to incorporate human-centered (HCP) evaluation approaches based on their CDM needs, processes, and goals.Materials and methods: The review was conducted in Ovid Medline and Scopus databases, following the Institute of Medicine's methodological standards and PRISMA guidelines. An individual study appraisal was conducted using design-specific appraisal tools. MaxQDA software was used for data extraction and evidence table procedures.Results: Of the 2673 unique records retrieved, 25 records were included in the final sample. Studies were excluded if they did not meet this review's definitions of HCP evaluation (1156), healthcare use (995), explainable AI (211), and primary research (285), and if they were not available in English (1). The sample focused primarily on physicians and diagnostic imaging use cases and revealed wide-ranging evaluation measures.Discussion: The synthesis of sampled studies suggests a potential common measure of clinical explainability with 3 indicators of interpretability, fidelity, and clinical value. There is an opportunity to extend the current model-centered evaluation approaches to incorporate human-centered metrics, supporting the transition into practice.Conclusion: Future research should aim to clarify and expand key concepts in HCP evaluation, propose a comprehensive evaluation model positioned in current theoretical knowledge, and develop a valid instrument to support comparisons.","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2025-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144627601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SNOMED CT entity linking challenge. SNOMED CT实体连接挑战。

IF 4.7 2区医学

Journal of the American Medical Informatics Association Pub Date : 2025-07-14 DOI: 10.1093/jamia/ocaf104

Rory Davidson, Will Hardman, Guy Amit, Yonatan Bilu, Vincenzo Della Mea, Aleksandr Galaida, Irena Girshovitz, Mikhail Kulyabin, Mihai Horia Popescu, Kevin Roitero, Gleb Sokolov, Chen Yanover

{"title":"SNOMED CT entity linking challenge.","authors":"Rory Davidson, Will Hardman, Guy Amit, Yonatan Bilu, Vincenzo Della Mea, Aleksandr Galaida, Irena Girshovitz, Mikhail Kulyabin, Mihai Horia Popescu, Kevin Roitero, Gleb Sokolov, Chen Yanover","doi":"10.1093/jamia/ocaf104","DOIUrl":"https://doi.org/10.1093/jamia/ocaf104","url":null,"abstract":"Objective: This paper presents the results from a competition challenging participants to develop entity linking models using a subset of annotated MIMIC-IV-Note data and the SNOMED CT Terminology.Materials and methods: As a basis for this work, a large set of 74 808 annotations was curated across 272 discharge notes spanning 6624 unique clinical concepts. Submissions were evaluated using the mean Intersection-over-Union metric, evaluated at the character level with the 3 best performing solutions awarded a cash prize.Results: The winning solutions employed contrasting approaches: a dictionary-based method, an encoder-based method, and a decoder-based method.Discussion: Our analysis reveals that concept frequency in training data significantly impacts model performance, with rare concepts proving particularly challenging. High concept entropy and annotation ambiguity were also associated with decreased performance.Conclusion: Findings from this work suggest that future projects should focus on improving entity linking for rare concepts and developing methods to better leverage contextual information when training examples are scarce.","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2025-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144627602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

In defense of empathic informatics. 为移情信息学辩护。

IF 4.7 2区医学

Journal of the American Medical Informatics Association Pub Date : 2025-07-11 DOI: 10.1093/jamia/ocaf107

Harry Hochheiser, Shyam Visweswaran

{"title":"In defense of empathic informatics.","authors":"Harry Hochheiser, Shyam Visweswaran","doi":"10.1093/jamia/ocaf107","DOIUrl":"https://doi.org/10.1093/jamia/ocaf107","url":null,"abstract":"Objectives: To explore the potential effects of recent restrictions on discussions regarding diversity, equity, and inclusion (DEI) in the field of biomedical informatics.Materials and methods: Executive orders issued by the U.S. federal government regarding diversity and gender issues are discussed in the context of implications for biomedical informatics research.Results: Restrictions on specific terminology can hinder research into critical topics such as bias and fairness in clinical artificial intelligence and machine learning algorithms. Additionally, these limitations may narrow the scope of questions that informatics research can address and obstruct efforts to enhance the diversity of perspectives within the field.Discussion: Responding to these threats requires a community response. The American Medical Informatics Association (AMIA) can help the informatics community present a united front in support of DEI research in multiple ways.Conclusion: The informatics community should take a strong and unambiguous response to support diversity, equity, and inclusion of underrepresented perspectives in the field.","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144612432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Differences in Physician EHR Use by Telemedicine Intensity: Evidence from Two Academic Medical Centers. 远程医疗强度对医师电子病历使用的影响：来自两个学术医疗中心的证据。

IF 4.7 2区医学

Journal of the American Medical Informatics Association Pub Date : 2025-07-11 DOI: 10.1093/jamia/ocaf122

Seunghwan Kim, Robert Thombley, Elise Eiden, Sunny S Lou, Julia Adler-Milstein, Thomas Kannampallil, A Jay Holmgren

{"title":"Differences in Physician EHR Use by Telemedicine Intensity: Evidence from Two Academic Medical Centers.","authors":"Seunghwan Kim, Robert Thombley, Elise Eiden, Sunny S Lou, Julia Adler-Milstein, Thomas Kannampallil, A Jay Holmgren","doi":"10.1093/jamia/ocaf122","DOIUrl":"https://doi.org/10.1093/jamia/ocaf122","url":null,"abstract":"Objective: Evaluate the association between telemedicine intensity and ambulatory physician electronic health record (EHR) use following the COVID-19 pandemic.Methods: This retrospective study included ambulatory physicians in 11 specialties at two large academic medical centers (Washington University in St Louis [WashU], University of California San Francisco [UCSF]). EHR use measures, including time-based and frequency-based, were analyzed in the post-COVID-19 period (March 1, 2021, through March 7, 2022). Multivariable regression models with two-way fixed effects were used to assess the association between telemedicine intensity and EHR use.Results: Fully telemedicine physician-weeks were associated with higher EHR (hours per 8 patient scheduled hours; β=3.2 at WashU, β=1.4 at UCSF; p<.001) and documentation time (β=2.7 at WashU, β=1.4 at UCSF; p<.001). Several differences in discrete EHR-based tasks were observed: fully telemedicine physician-days were associated with lesser ordering, and there were mixed patterns for information seeking and clinical communication tasks.Discussion: Expanded use of telemedicine was associated with significant changes in physician EHR use post-COVID-19 onset. Increased EHR time may suggest a shift in workload, whereas decreased ordering may suggest constraints in virtual care, such as ability to perform physical examination and the reliance on patient-reported symptoms. Institutional differences usage patterns suggest that telemedicine's impact is context-specific and provides opportunities for understanding how to optimize EHRs to support telemedicine.Conclusion: Telemedicine shifts physician EHR. Supporting physicians through optimized EHR tools, tailored workflows, and team-based interventions is essential for sustainable virtual care delivery without exacerbating EHR burden.","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144620961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Biomedical and health informatics Potpourri. 生物医学和健康信息学。

IF 4.7 2区医学

Journal of the American Medical Informatics Association Pub Date : 2025-07-01 DOI: 10.1093/jamia/ocaf097

Suzanne Bakken

引用次数: 0

Incorporating end-user perspectives into the development of a machine learning algorithm for first time perinatal depression prediction. 将最终用户的观点纳入首次围产期抑郁症预测的机器学习算法的开发。

IF 4.7 2区医学

Journal of the American Medical Informatics Association Pub Date : 2025-07-01 DOI: 10.1093/jamia/ocaf086

Kelly Williams, Cara Nikolajski, Samantha Rodriguez, Elaine Kwok, Priya Gopalan, Hyagriv Simhan, Tamar Krishnamurti

{"title":"Incorporating end-user perspectives into the development of a machine learning algorithm for first time perinatal depression prediction.","authors":"Kelly Williams, Cara Nikolajski, Samantha Rodriguez, Elaine Kwok, Priya Gopalan, Hyagriv Simhan, Tamar Krishnamurti","doi":"10.1093/jamia/ocaf086","DOIUrl":"10.1093/jamia/ocaf086","url":null,"abstract":"Objective: Machine learning algorithms can advance clinical care, including identifying mental health conditions. These algorithms are often developed without considering the perspectives of the affected populations. This study describes the process of incorporating end-user perspectives into the development and implementation planning of a prediction algorithm for new perinatal depression onset.Materials and methods: A focus group (N = 12 providers) and four virtual community engagement studios (N = 21 patients) were conducted. The project team presented on the initial development of a novel prediction algorithm used to detect first time perinatal depression. Rapid qualitative analysis coded the prediction algorithm's completeness, interpretability, and acceptability to stakeholders, with the goal of informing clinical implementation of a patient-facing screener produced from the prediction algorithm.Results: Providers and patients showed consensus on the interpretability of the prediction algorithm's variables and discussed additional variables believed to be predictive of depression to ensure its completeness. In terms of acceptability, patients expressed a desire to discuss predictive risk screening results with their provider, while providers voiced concerns about limited bandwidth for these discussions. Both groups identified the need for post-screening resource connection but raised concerns over the availability of depression prevention specific resources. Providers and patients reported positively about their engagement in the sessions.Discussion: Qualitative findings were incorporated into iterative algorithm development and informed an implementation pilot plan.Conclusion: This study demonstrates how the expertise of the end-users of a risk prediction algorithm can be incorporated into its development, which may ultimately increase clinical adoption.","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"1186-1198"},"PeriodicalIF":4.7,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12199750/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144267836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Generating synthetic electronic health record data: a methodological scoping review with benchmarking on phenotype data and open-source software. 生成合成电子健康记录数据：对表型数据和开源软件进行基准测试的方法学范围审查。

IF 4.7 2区医学

Journal of the American Medical Informatics Association Pub Date : 2025-07-01 DOI: 10.1093/jamia/ocaf082

Xingran Chen, Zhenke Wu, Xu Shi, Hyunghoon Cho, Bhramar Mukherjee

{"title":"Generating synthetic electronic health record data: a methodological scoping review with benchmarking on phenotype data and open-source software.","authors":"Xingran Chen, Zhenke Wu, Xu Shi, Hyunghoon Cho, Bhramar Mukherjee","doi":"10.1093/jamia/ocaf082","DOIUrl":"10.1093/jamia/ocaf082","url":null,"abstract":"Objectives: To conduct a scoping review (ScR) of existing approaches for synthetic Electronic Health Records (EHR) data generation, to benchmark major methods, and to provide an open-source software and offer recommendations for practitioners.Materials and methods: We search three academic databases for our scoping review. Methods are benchmarked on open-source EHR datasets, Medical Information Mart for Intensive Care III and IV (MIMIC-III/IV). Seven existing methods covering major categories and two baseline methods are implemented and compared. Evaluation metrics concern data fidelity, downstream utility, privacy protection, and computational cost.Results: Forty-eight studies are identified and classified into five categories. Seven open-source methods covering all categories are selected, trained on MIMIC-III, and evaluated on MIMIC-III or MIMIC-IV for transportability considerations. Among them, Generative Adversarial Network (GAN)-based methods demonstrate competitive performance in fidelity and utility on MIMIC-III, rule-based methods excel in privacy protection. Similar findings are observed on MIMIC-IV, except that GAN-based methods further outperform the baseline methods in preserving fidelity.Discussion: Method choice is governed by the relative importance of the evaluation metrics in downstream use cases. We provide a decision tree to guide the choice among the benchmarked methods. An extensible Python package, \"SynthEHRella\", is provided to facilitate streamlined evaluations.Conclusion: GAN-based methods excel when distributional shifts exist between the training and testing populations. Otherwise, CorGAN and MedGAN are most suitable for association modeling and predictive modeling, respectively. Future research should prioritize enhancing fidelity of the synthetic data while controlling privacy exposure, and comprehensive benchmarking of longitudinal or conditional generation methods.","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"1227-1240"},"PeriodicalIF":4.7,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12203555/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144217409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Dynamic few-shot prompting for clinical note section classification using lightweight, open-source large language models. 使用轻量级、开源的大型语言模型进行临床笔记部分分类的动态少量提示。

IF 4.7 2区医学

Journal of the American Medical Informatics Association Pub Date : 2025-07-01 DOI: 10.1093/jamia/ocaf084

Kurt Miller, Steven Bedrick, Qiuhao Lu, Andrew Wen, William Hersh, Kirk Roberts, Hongfang Liu

{"title":"Dynamic few-shot prompting for clinical note section classification using lightweight, open-source large language models.","authors":"Kurt Miller, Steven Bedrick, Qiuhao Lu, Andrew Wen, William Hersh, Kirk Roberts, Hongfang Liu","doi":"10.1093/jamia/ocaf084","DOIUrl":"10.1093/jamia/ocaf084","url":null,"abstract":"Objective: Unlocking clinical information embedded in clinical notes has been hindered to a significant degree by domain-specific and context-sensitive language. Identification of note sections and structural document elements has been shown to improve information extraction and dependent downstream clinical natural language processing (NLP) tasks and applications. This study investigates the viability of a dynamic example selection prompting method to section classification using lightweight, open-source large language models (LLMs) as a practical solution for real-world healthcare clinical NLP systems.Materials and methods: We develop a dynamic few-shot prompting approach to classifying sections where section samples are first embedded using a transformer-based model and deposited in a vector store. During inference, the embedded samples with the most similar contextual embeddings to a given input section text are retrieved from the vector store and inserted into the LLM prompt. We evaluate this technique on two datasets comprising two section schemas, including varying levels of context. We compare the performance to baseline zero-shot and randomly selected few-shot scenarios.Results: The dynamic few-shot prompting experiments yielded the highest F1 scores in each of the classification tasks and datasets for all seven of the LLMs included in the evaluation, averaging a macro F1 increase of 39.3% and 21.1% in our primary section classification task over the zero-shot and static few-shot baselines, respectively.Discussion and conclusion: Our results showcase substantial performance improvements imparted by dynamically selecting examples for few-shot LLM prompting, and further improvement by including section context, demonstrating compelling potential for clinical applications.","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"1164-1173"},"PeriodicalIF":4.7,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12203503/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144217407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0