Journal of Biomedical Informatics最新文献_第5页

Corrigendum to “A pipeline for harmonising NHS Scotland laboratory data to enable national-level analyses” [J. Biomed. Inform. 2025 Feb;162:104771. https://doi.org/10.1016/j.jbi.2024.104771. Epub 2025 Jan 2. PMID: 39755323] “统一NHS苏格兰实验室数据以实现国家级分析的管道”的勘误表[J]。生物医学。通报。2025年2月；162:104771。https://doi.org/10.1016/j.jbi.2024.104771。Epub 2025年1月2日PMID: 39755323)

IF 4.5 2区医学

Journal of Biomedical Informatics Pub Date : 2025-09-01 DOI: 10.1016/j.jbi.2025.104890

Chuang Gao , Shahzad Mumtaz , Sophie McCall , Katherine O'Sullivan , Mark McGilchrist , Daniel R. Morales , Christopher Hall , Katie Wilde , Charlie Mayor , Pamela Linksted , Kathy Harrison , Christian Cole , Emily Jefferson

引用次数: 0

Learning from multiple data sources for decision making in health care 从多个数据源中学习，用于卫生保健决策。

IF 4.5 2区医学

Journal of Biomedical Informatics Pub Date : 2025-09-01 DOI: 10.1016/j.jbi.2025.104892

Fabio Stella (Guest Editors), Francesco Calimeri, Mauro Dragoni

引用次数: 0

WoundcareVQA: A multilingual visual question answering benchmark dataset for wound care WoundcareVQA：伤口护理的多语言视觉问答基准数据集。

IF 4.5 2区医学

Journal of Biomedical Informatics Pub Date : 2025-08-29 DOI: 10.1016/j.jbi.2025.104888

Wen-wai Yim , Asma Ben Abacha , Robert Doerning , Chia-Yu Chen , Jiaying Xu , Anita Subbarao , Zixuan Yu , Fei Xia , M. Kennedy Hall , Meliha Yetisgen

{"title":"WoundcareVQA: A multilingual visual question answering benchmark dataset for wound care","authors":"Wen-wai Yim , Asma Ben Abacha , Robert Doerning , Chia-Yu Chen , Jiaying Xu , Anita Subbarao , Zixuan Yu , Fei Xia , M. Kennedy Hall , Meliha Yetisgen","doi":"10.1016/j.jbi.2025.104888","DOIUrl":"10.1016/j.jbi.2025.104888","url":null,"abstract":"<div><h3>Objective:</h3><div>Introduce the task of wound care multimodal multilingual visual question answering, provide baseline performances, and identify areas of future study.</div></div><div><h3>Methods:</h3><div>A dataset of wound care multimodal multilingual visual question answering (VQA) was created using consumer health questions asked online. Practicing US medical doctors were tasked with providing metadata and expert responses labels. Several instruct-enabled, multilingual visual question answering models (GPT-4o, Gemini-1.5-Pro, and Qwen-VL) were tested to benchmark performances. Finally, automatic evaluations were tested against domain expert response ratings.</div></div><div><h3>Results:</h3><div>A multilingual dataset of 477 wound care cases, 768 responses, 748 images, 3k structured data labels, 1362 translation instances, and 10k judgments was constructed (<span><span>https://osf.io/xsj5u/</span><svg><path></path></svg></span>). Metadata scores ranged from 0.32–0.78 accuracy depending on classification type; response generation performances 0.06 BLEU, 0.66 BERTScore, 0.45 ROUGE-L in English and 0.12 BLEU, 0.69 BERTScore, and 0.50 ROUGE-L in Chinese.</div></div><div><h3>Conclusion:</h3><div>We construct and explore the tasks of multimodal, multilingual VQA. We hope the work here can inspire further research in wound care metadata classification, VQA response generation, and open response automatic evaluation.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"170 ","pages":"Article 104888"},"PeriodicalIF":4.5,"publicationDate":"2025-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144955644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MedVidDeID: Protecting privacy in clinical encounter video recordings MedVidDeID：在临床遭遇视频记录中保护隐私。

IF 4.5 2区医学

Journal of Biomedical Informatics Pub Date : 2025-08-29 DOI: 10.1016/j.jbi.2025.104901

Sriharsha Mopidevi , Kuk Jin Jang , Basam Alasaly , Sydney Pugh , Jean Park , Ashley Batugo , Sy Hwang , Eric Eaton , Danielle Lee Mowery , Kevin B. Johnson

{"title":"MedVidDeID: Protecting privacy in clinical encounter video recordings","authors":"Sriharsha Mopidevi , Kuk Jin Jang , Basam Alasaly , Sydney Pugh , Jean Park , Ashley Batugo , Sy Hwang , Eric Eaton , Danielle Lee Mowery , Kevin B. Johnson","doi":"10.1016/j.jbi.2025.104901","DOIUrl":"10.1016/j.jbi.2025.104901","url":null,"abstract":"<div><h3>Objective:</h3><div>The increasing use of audio-video (AV) data in healthcare has improved patient care, clinical training, and medical and ethnographic research. However, it has also introduced major challenges in preserving patient-provider privacy due to Protected Health Information (PHI) in such data. Traditional de-identification methods are inadequate for AV data, which can reveal identifiable information such as faces, voices, and environmental details. Our goal was to create a pipeline for de-identifying AV healthcare data that minimized the human effort required to guarantee successful de-identification.</div></div><div><h3>Methods:</h3><div>We combined open-source tools with novel methods and infrastructure into a six-stage pipeline: (1) transcript extraction using WhisperX, (2) transcript de-identification with an adapted PHIlter, (3) audio de-identification through scrubbing, (4) video de-identification using YOLOv11 for pose detection and blurring, (5) recombining de-identified audio and video, and (6) validation and correction via manual quality control (QC). We developed two de-identification strategies to support different tolerances for lossy video images. We evaluated this pipeline using 10 h of simulated clinical AV recordings, comprising nearly 1.1 million video frames and approximately 72,000 words.</div></div><div><h3>Results:</h3><div>In Precision Privacy Preservation (PPP) mode, MedVidDeId achieved a success rate of 50%, while in Greedy Privacy Preservation (GPP) mode, it achieved a 97.5% success rate. Compared to manual methods for a 15 min video segment, the pipeline reduced de-identification time by 26.7% in PPP and 64.2% in GPP modes.</div></div><div><h3>Conclusion:</h3><div>The MedVidDeID pipeline offers a viable, efficient hybrid solution for handling AV healthcare data and privacy preservation. Future work will focus on reducing upstream errors at each stage and minimizing the role of the human in the loop.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"170 ","pages":"Article 104901"},"PeriodicalIF":4.5,"publicationDate":"2025-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144955583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Improving large language models for adverse drug reactions named entity recognition via error correction prompt engineering 通过纠错提示工程改进药物不良反应命名实体识别的大型语言模型

IF 4.5 2区医学

Journal of Biomedical Informatics Pub Date : 2025-08-28 DOI: 10.1016/j.jbi.2025.104893

Yunfei Zhang, Wei Liao

{"title":"Improving large language models for adverse drug reactions named entity recognition via error correction prompt engineering","authors":"Yunfei Zhang, Wei Liao","doi":"10.1016/j.jbi.2025.104893","DOIUrl":"10.1016/j.jbi.2025.104893","url":null,"abstract":"<div><div>The monitoring and analysis of adverse drug reactions (ADRs ) are important for ensuring patient safety and improving treatment outcomes. Accurate identification of drug names, drug components, and ADR entities during named entity recognition (NER) processes is essential for ensuring drug safety and advancing the integration of drug information. Given that existing medical name entity recognition technologies rely on large amounts of manually annotated data for training, they are often less effective when applied to adverse drug reactions due to significant data variability and the high similarity between drug names. This paper proposes a prompt template for ADR that integrates error correction examples. The prompt template includes: 1. Basic prompts with task descriptions, 2. Annotated entity explanations, 3. Annotation guidelines, 4. Annotated samples for few-shot learning, 5. Error correction examples. Additionally, it integrates complex ADR data from the web and constructs a corpus containing three types of entities (drug name, drug components, and adverse drug reactions) using the Begin, Inside, Outside (BIO) annotation method. Finally, we evaluate the effectiveness of each prompt and compare it with the fine-tuned Large Language Model Meta AI (LLaMA) model and the DeepSeek model. Experimental results show that under this prompt template, the F1 score of GPT-3.5 increased from 0.648 to 0.887, and that of GPT-4 increased from 0.757 to 0.921. It is significantly better than the fine-tuned LLaMA model and DeepSeek model. It demonstrates the superiority of the proposed method, and provides a solid foundation for extracting drug-related entity relationships and building knowledge graphs.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"170 ","pages":"Article 104893"},"PeriodicalIF":4.5,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144926249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Strategies for detecting and mitigating dataset shift in machine learning for health predictions: A systematic review 用于健康预测的机器学习中检测和减轻数据集转移的策略：系统综述

IF 4.5 2区医学

Journal of Biomedical Informatics Pub Date : 2025-08-26 DOI: 10.1016/j.jbi.2025.104902

Gabriel Ferreira dos Santos Silva , Fabiano Novaes Barcellos Filho , Roberta Moreira Wichmann , Francisco Costa da Silva Junior , Alexandre Dias Porto Chiavegatto Filho

{"title":"Strategies for detecting and mitigating dataset shift in machine learning for health predictions: A systematic review","authors":"Gabriel Ferreira dos Santos Silva , Fabiano Novaes Barcellos Filho , Roberta Moreira Wichmann , Francisco Costa da Silva Junior , Alexandre Dias Porto Chiavegatto Filho","doi":"10.1016/j.jbi.2025.104902","DOIUrl":"10.1016/j.jbi.2025.104902","url":null,"abstract":"<div><h3>Objective</h3><div>This review aims to provide a comprehensive overview of the literature on methods and techniques for identifying and correcting dataset shift in machine learning (ML) applications for health predictions.</div></div><div><h3>Methods</h3><div>A systematic search was conducted across PubMed, IEEE Xplore, Scopus, and Web of Science, targeting articles published between January 1, 2019, and March 15, 2025. earch strings combined terms related to machine learning, healthcare, and dataset shift. A total of 32 studies were included, and were evaluated based on dataset shift types addressed, detection and correction strategies used, algorithmic choices, and reported impacts on model performance.</div></div><div><h3>Results</h3><div>The review identified a wide range of dataset shift types, with temporal shift and concept drift being the most commonly addressed. Model-based monitoring and statistical tests were the most frequent detection strategies, while retraining and feature engineering were the predominant correction approaches. Most methods demonstrate moderate interpretability, computational feasibility, and generalizability. However, a lack of standardized performance metrics and external validations limited the comparability of results across studies.</div></div><div><h3>Conclusion</h3><div>While several promising approaches for managing dataset shift in health-related ML models have been proposed, no single method emerged as broadly generalizable across use cases. The implementation of these techniques in real-world clinical workflows remains limited. Future research should prioritize prospective evaluations, subgroup-specific analyses (e.g., by race, age, or geographic region), and integration into clinical decision-support systems to ensure robust and equitable ML deployment in healthcare settings. A structured summary table and conceptual pipeline diagram are provided to support practical adoption.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"170 ","pages":"Article 104902"},"PeriodicalIF":4.5,"publicationDate":"2025-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144926250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CeRTS: certainty retrieval token search in large language model clinical information extraction CeRTS：确定性检索令牌搜索在大语言模型临床信息提取中的应用。

IF 4.5 2区医学

Journal of Biomedical Informatics Pub Date : 2025-08-23 DOI: 10.1016/j.jbi.2025.104900

Lars E. Schimmelpfennig , Kriti Bhattarai , Inez Y. Oh , Jake Lever , Obi L. Griffith , Malachi Griffith , Albert M. Lai , Zachary B. Abrams

{"title":"CeRTS: certainty retrieval token search in large language model clinical information extraction","authors":"Lars E. Schimmelpfennig , Kriti Bhattarai , Inez Y. Oh , Jake Lever , Obi L. Griffith , Malachi Griffith , Albert M. Lai , Zachary B. Abrams","doi":"10.1016/j.jbi.2025.104900","DOIUrl":"10.1016/j.jbi.2025.104900","url":null,"abstract":"<div><h3>Objective</h3><div>Large language models (LLMs) must effectively communicate their uncertainty to be viable in clinical settings. As such, the need for reliable uncertainty estimation grows increasingly urgent with the expanding use of LLMs for information extraction from electronic health records. Previous token-level uncertainty estimators have only used token probabilities within a single output sequence. Here, by leveraging the constraints of JSON output structure, we instead consider all likely sequences and their respective probabilities to obtain a more robust measure of model confidence. We develop Certainty Retrieval Token Search (CeRTS), a new uncertainty estimator for structured information extraction.</div></div><div><h3>Methods</h3><div>We evaluated CeRTS against a previous gold-standard uncertainty estimator when extracting clinical features from lung cancer discharge summaries across eight open-source LLMs. Calibration (Brier score) and discrimination (AUROC) were used to quantify performance.</div></div><div><h3>Results</h3><div>CeRTS surpassed the previous gold-standard estimator in discriminatory power across every model and achieved better calibration in most cases. CeRTS had the strongest agreement between model confidence and accuracy with Qwen-2.5.</div></div><div><h3>Conclusion</h3><div>CeRTS enhances LLM-based information extraction from unstructured clinical text by assigning well-calibrated confidence scores to each extracted item, providing medical researchers with a quantitative measure of reliability at minimal additional cost. Although its performance was generally robust, CeRTS struggled with DeepSeek-R1, which we attribute to the model’s Chain-of-Thought reasoning steps. Our evaluation focused on clinical data, but CeRTS can be applied to any domain requiring reliable uncertainty estimation.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"170 ","pages":"Article 104900"},"PeriodicalIF":4.5,"publicationDate":"2025-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144955653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Perplexity and proximity: Large language model perplexity complements semantic distance metrics for the detection of incoherent speech 困惑和接近：大型语言模型困惑补充了语义距离度量来检测不连贯的语音

IF 4.5 2区医学

Journal of Biomedical Informatics Pub Date : 2025-08-21 DOI: 10.1016/j.jbi.2025.104899

Weizhe Xu , Serguei Pakhomov , Patrick Heagerty , Eric Horvitz , Ellen R. Bradley , Josh Woolley , Andrew Campbell , Alex Cohen , Dror Ben-Zeev , Trevor Cohen

{"title":"Perplexity and proximity: Large language model perplexity complements semantic distance metrics for the detection of incoherent speech","authors":"Weizhe Xu , Serguei Pakhomov , Patrick Heagerty , Eric Horvitz , Ellen R. Bradley , Josh Woolley , Andrew Campbell , Alex Cohen , Dror Ben-Zeev , Trevor Cohen","doi":"10.1016/j.jbi.2025.104899","DOIUrl":"10.1016/j.jbi.2025.104899","url":null,"abstract":"<div><h3>Objective</h3><div><em>Semantic coherence</em> in speech is characterized by a logical, connected flow of ideas. A lack of coherence in speech may reflect disorganized thinking, a core feature of psychosis in schizophrenia spectrum disorders (SSDs). Developing tools that could help with automated assessment of semantic coherence in language could facilitate early detection of SSDs and improved monitoring of symptoms, enabling more timely intervention. Large language models (LLMs) have demonstrated strong capabilities on numerous language-centric tasks and have shown promise for analyzing semantic coherence due to the natural fit between their innate measures of language perplexity and the surprising turns that incoherent narrative often takes. This study aims to develop a novel representation and associated measure of semantic coherence using LLM-based perplexity metrics and to compare this measure with traditional vector distance-based coherence metrics.</div></div><div><h3>Method</h3><div>We evaluated “bag” and “chain” models based on LLM perplexities as measures of semantic coherence. Regression models were trained using both single and paired combinations of perplexity- and proximity-based features to predict human ratings of semantic coherence using standardized instruments. Performance was evaluated on held-out examples from a training set of speeches from individuals experiencing psychotic symptoms and a test set of clinical interviews with patients diagnosed with SSDs, both with labels from human assessments of disorganized thinking severity.</div></div><div><h3>Results</h3><div>The best performance was achieved using a combination of perplexity and proximity features, yielding a Spearman correlation with human ratings of 0.61 (vs. 0.56 with proximity features alone) on leave-one-out cross-validation in the training set, and 0.54 (vs. 0.52 with proximity features alone) on the test set.</div></div><div><h3>Conclusion</h3><div>We developed novel methods for assessing semantic coherence using LLM perplexities and found them complementary to proximity-based methods. Combined, these methods showed improved performance across two datasets, highlighting LLM’s potential in enhancing automated diagnosis and monitoring of SSDs.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"170 ","pages":"Article 104899"},"PeriodicalIF":4.5,"publicationDate":"2025-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144908092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Medication information extraction using local large language models 基于局部大语言模型的药物信息提取

IF 4.5 2区医学

Journal of Biomedical Informatics Pub Date : 2025-08-21 DOI: 10.1016/j.jbi.2025.104898

Phillip Richter-Pechanski , Marvin Seiferling , Christina Kiriakou , Dominic M. Schwab , Nicolas A. Geis , Christoph Dieterich , Anette Frank

{"title":"Medication information extraction using local large language models","authors":"Phillip Richter-Pechanski , Marvin Seiferling , Christina Kiriakou , Dominic M. Schwab , Nicolas A. Geis , Christoph Dieterich , Anette Frank","doi":"10.1016/j.jbi.2025.104898","DOIUrl":"10.1016/j.jbi.2025.104898","url":null,"abstract":"<div><h3>Objective</h3><div>Medication information is crucial for clinical routine and research. However, a vast amount is stored in unstructured text, such as doctor’s letters, requiring manual extraction – a resource-intensive, error-prone task. Automating this process comes with significant constraints in a clinical setup, including the demand for clinical expertise, limited time-resources, restricted IT infrastructure, and the demand for transparent predictions. Recent advances in generative large language models (LLMs) and parameter-efficient fine-tuning methods show potential to address these challenges.</div></div><div><h3>Methods</h3><div>We evaluated local LLMs for end-to-end extraction of medication information, combining named entity recognition and relation extraction. We used format-restricting instructions and developed an innovative feedback pipeline to facilitate automated evaluation. We applied token-level Shapley values to visualize and quantify token contributions, to improve transparency of model predictions.</div></div><div><h3>Results</h3><div>Two open-source LLMs – one general (Llama) and one domain-specific (OpenBioLLM) – were evaluated on the English n2c2 2018 corpus and the German CARDIO:DE corpus. OpenBioLLM frequently struggled with structured outputs and hallucinations. Fine-tuned Llama models achieved new state-of-the-art results, improving F1-score by up to 10 percentage points (pp.) for adverse drug events and 6 pp. for medication reasons on English data. On the German dataset, Llama established a new benchmark, outperforming traditional machine learning methods by up to 16 pp. micro average F1-score.</div></div><div><h3>Conclusion</h3><div>Our findings show that fine-tuned local open-source generative LLMs outperform SOTA methods for medication information extraction, delivering high performance with limited time and IT resources in a real-world clinical setup, and demonstrate their effectiveness on both English and German data. Applying Shapley values improved prediction transparency, supporting informed clinical decision-making.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"169 ","pages":"Article 104898"},"PeriodicalIF":4.5,"publicationDate":"2025-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144892008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Resource-efficient instruction tuning of large language models for biomedical named entity recognition 生物医学命名实体识别大型语言模型的资源高效指令调优

IF 4.5 2区医学

Journal of Biomedical Informatics Pub Date : 2025-08-21 DOI: 10.1016/j.jbi.2025.104896

Hui Liu , Ziyi Chen , Peilin Li , Yuan-Zhi Liu , Xiangtao Liu , Ronald X. Xu , Mingzhai Sun

{"title":"Resource-efficient instruction tuning of large language models for biomedical named entity recognition","authors":"Hui Liu , Ziyi Chen , Peilin Li , Yuan-Zhi Liu , Xiangtao Liu , Ronald X. Xu , Mingzhai Sun","doi":"10.1016/j.jbi.2025.104896","DOIUrl":"10.1016/j.jbi.2025.104896","url":null,"abstract":"<div><h3>Objective:</h3><div>Large language models (LLMs) have exhibited remarkable efficacy in natural language processing (NLP) tasks, with fine-tuning for Biomedical Named Entity Recognition (BioNER) receiving significant research attention. However, the substantial computational demands associated with fine-tuning large-scale models constrain their development and deployment. Consequently, this study investigates parameter-efficient fine-tuning (PEFT) techniques to optimize LLMs for BioNER under limited computational resources. By leveraging these methods, competitive model performance is maintained while preserving in-domain generalization capability.</div></div><div><h3>Methods:</h3><div>In this study, we employed the PEFT method QLoRA to fine-tune the open-source Llama3.1 model, developing the NERLlama3.1 model specifically designed for the BioNER task. First, an LLM instruction tuning dataset was created using BioNER datasets such as NCBI-disease, BC5CDR-chem, and BC2GM-gene. Next, the Llama3.1-8B model was fine-tuned using the QLoRA method on a single 16GB memory GPU. Furthermore, during the inference phase, we introduced a prompt engineering technique called self-consistency NER prompting (SCNP). This approach leverages the diversity of outputs generated by LLMs to significantly enhance NER performance. Finally, we also developed a multi-task BioNER-capable model, NERLlama3.1-MT, to investigate the capability of fine-tuned LLMs in addressing multi-task BioNER scenarios.</div></div><div><h3>Results:</h3><div>The NERLlama3.1 model achieved F1-scores of 0.8977, 0.9402, and 0.8530 on the NCBI-disease, BC5CDR-chemical, and BG2GM-gene datasets, respectively. Furthermore, when evaluated on previously unseen datasets, it attained F1-scores of 0.6867 on BC5CDR-disease, 0.6800 on NLM-chemical, and 0.8378 on NLM-gene. These results demonstrate that NERLlama3.1 not only outperforms fully fine-tuned LLMs but also exhibits superior in-domain generalization capabilities when compared to the BERT-base model. Additionally, this work represents the first exploration of fine-tuning LLMs for multi-task BioNER.</div></div><div><h3>Conclusion:</h3><div>NERLlama3.1 outperformed LLMs fine-tuned with full parameter updates, despite requiring significantly fewer computational resources. Moreover, it exhibited substantially superior in-domain generalization capabilities compared to traditional pre-trained language models. Its low resource demands, high performance, and strong generalization enhance its applicability and utility across diverse clinical BioNER tasks.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"170 ","pages":"Article 104896"},"PeriodicalIF":4.5,"publicationDate":"2025-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144926248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0