详细说明与通用说明对患者出院说明生成的微调语言模型的影响：比较统计分析。

IF 2 Q3 HEALTH CARE SCIENCES & SERVICES

JMIR Formative Research Pub Date : 2025-09-30 DOI:10.2196/80917

Muneerah Alqahtani, Abdullah Al-Barakati, Fahd Alotaibi, Mohammed Al Shibli, Saad Almousa

{"title":"详细说明与通用说明对患者出院说明生成的微调语言模型的影响：比较统计分析。","authors":"Muneerah Alqahtani, Abdullah Al-Barakati, Fahd Alotaibi, Mohammed Al Shibli, Saad Almousa","doi":"10.2196/80917","DOIUrl":null,"url":null,"abstract":"Background: Discharge instructions are essential for patient post-hospital care, but are time-consuming to write. With the rise of large language models (LLMs), there is strong potential to automate this process. This study explores the use of open-source LLMs for generating discharge instructions.Objective: We investigated whether a Mistral model can reliably generate patient-oriented discharge instructions. Two distinct instruction-tuning paradigms were compared, each using a different mechanism for embedding guidance during fine-tuning.Methods: In our experiment, we applied Mistral-NeMo-Instruct, a large language model, in combination with two distinct instruction strategies for fine-tuning. The first were detailed instructions tailored to the task of discharge instruction generation. The second was a basic instruction with minimal guidance and no task-specific detail. The independent variable in this study is the instruction strategy (detailed vs. generic), while the dependent variables are the evaluation scores of the generated discharge instructions. The generated discharge instructions were evaluated against 3,621 ground-truth references. We used BLEU-1 to BLEU-4, ROUGE (ROUGE-1, ROUGE-2, ROUGE-L), SentenceTransformer similarity, and BERTScore as evaluation metrics to assess the quality of the generated outputs in comparison to the corresponding ground-truth instructions for the same discharge summaries.Results: The detailed instruction model demonstrated superior performance across all automated evaluation metrics compared with the generic instruction model. BERTScore increased from 78.92% to 87.05%, while structural alignment measured by ROUGE-L improved from 8.59% to 26.52%. N-gram precision (BLEU-4) increased from 0.81% to 21.24%, and METEOR scores rose from 15.33% to 18.47%. Additional metrics showed consistent gains: ROUGE-1 improved from 16.59% to 42.72%, and ROUGE-2 increased from 1.97% to 45.84%. All improvements were statistically significant (P < .001), indicating that detailed, task-specific instruction design substantially enhances model performance.Conclusions: The use of detailed, task-specific instruction strategies significantly enhances the effectiveness of open-source large language models in generating discharge instructions. These findings indicate that carefully designed instructions during fine-tuning substantially improve model performance.Clinicaltrial: ","PeriodicalId":14841,"journal":{"name":"JMIR Formative Research","volume":" ","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Impact of Detailed Versus Generic Instructions on Fine-Tuned Language Models for Patient Discharge Instructions Generation: Comparative Statistical Analysis.\",\"authors\":\"Muneerah Alqahtani, Abdullah Al-Barakati, Fahd Alotaibi, Mohammed Al Shibli, Saad Almousa\",\"doi\":\"10.2196/80917\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Discharge instructions are essential for patient post-hospital care, but are time-consuming to write. With the rise of large language models (LLMs), there is strong potential to automate this process. This study explores the use of open-source LLMs for generating discharge instructions.Objective: We investigated whether a Mistral model can reliably generate patient-oriented discharge instructions. Two distinct instruction-tuning paradigms were compared, each using a different mechanism for embedding guidance during fine-tuning.Methods: In our experiment, we applied Mistral-NeMo-Instruct, a large language model, in combination with two distinct instruction strategies for fine-tuning. The first were detailed instructions tailored to the task of discharge instruction generation. The second was a basic instruction with minimal guidance and no task-specific detail. The independent variable in this study is the instruction strategy (detailed vs. generic), while the dependent variables are the evaluation scores of the generated discharge instructions. The generated discharge instructions were evaluated against 3,621 ground-truth references. We used BLEU-1 to BLEU-4, ROUGE (ROUGE-1, ROUGE-2, ROUGE-L), SentenceTransformer similarity, and BERTScore as evaluation metrics to assess the quality of the generated outputs in comparison to the corresponding ground-truth instructions for the same discharge summaries.Results: The detailed instruction model demonstrated superior performance across all automated evaluation metrics compared with the generic instruction model. BERTScore increased from 78.92% to 87.05%, while structural alignment measured by ROUGE-L improved from 8.59% to 26.52%. N-gram precision (BLEU-4) increased from 0.81% to 21.24%, and METEOR scores rose from 15.33% to 18.47%. Additional metrics showed consistent gains: ROUGE-1 improved from 16.59% to 42.72%, and ROUGE-2 increased from 1.97% to 45.84%. All improvements were statistically significant (P < .001), indicating that detailed, task-specific instruction design substantially enhances model performance.Conclusions: The use of detailed, task-specific instruction strategies significantly enhances the effectiveness of open-source large language models in generating discharge instructions. These findings indicate that carefully designed instructions during fine-tuning substantially improve model performance.Clinicaltrial: \",\"PeriodicalId\":14841,\"journal\":{\"name\":\"JMIR Formative Research\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":2.0000,\"publicationDate\":\"2025-09-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JMIR Formative Research\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2196/80917\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Formative Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2196/80917","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}

引用次数: 0

摘要

背景：出院说明书对病人的院后护理至关重要，但写出院说明书很耗时。随着大型语言模型（llm）的兴起，自动化这个过程的潜力很大。本研究探讨了使用开源llm来生成放电指令。目的：研究Mistral模型是否能够可靠地生成面向患者的出院指示。比较了两种不同的指令调整范式，每种范式在微调过程中使用不同的嵌入引导机制。方法：在我们的实验中，我们应用Mistral-NeMo-Instruct，一个大型语言模型，结合两种不同的教学策略进行微调。第一种是针对放电指令生成任务量身定制的详细指令。第二种是基本的指令，只有很少的指导，没有具体的任务细节。本研究的自变量为指令策略（详细与一般），因变量为生成的出院指令的评估分数。生成的放电指令根据3,621个基准真实值进行评估。我们使用blue -1到blue -4、ROUGE （ROUGE-1、ROUGE-2、ROUGE- l）、SentenceTransformer相似度和BERTScore作为评估指标，与相同放电摘要的相应接地真值指令相比，评估生成输出的质量。结果：与通用指令模型相比，详细指令模型在所有自动评估指标上表现出优越的性能。BERTScore从78.92%提高到87.05%，ROUGE-L测量的结构对准度从8.59%提高到26.52%。n -g精密度（BLEU-4）由0.81%提高到21.24%，METEOR评分由15.33%提高到18.47%。其他指标显示了一致的收益：ROUGE-1从16.59%提高到42.72%，ROUGE-2从1.97%提高到45.84%。所有的改进都有统计学意义（P < 0.001），表明详细的、任务特定的指令设计大大提高了模型的性能。结论：使用详细的、特定任务的指令策略显著提高了开源大型语言模型在生成出院指令方面的有效性。这些发现表明，在微调期间精心设计的指令大大提高了模型的性能。临床试验:

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Impact of Detailed Versus Generic Instructions on Fine-Tuned Language Models for Patient Discharge Instructions Generation: Comparative Statistical Analysis.

Background: Discharge instructions are essential for patient post-hospital care, but are time-consuming to write. With the rise of large language models (LLMs), there is strong potential to automate this process. This study explores the use of open-source LLMs for generating discharge instructions.

Objective: We investigated whether a Mistral model can reliably generate patient-oriented discharge instructions. Two distinct instruction-tuning paradigms were compared, each using a different mechanism for embedding guidance during fine-tuning.

Methods: In our experiment, we applied Mistral-NeMo-Instruct, a large language model, in combination with two distinct instruction strategies for fine-tuning. The first were detailed instructions tailored to the task of discharge instruction generation. The second was a basic instruction with minimal guidance and no task-specific detail. The independent variable in this study is the instruction strategy (detailed vs. generic), while the dependent variables are the evaluation scores of the generated discharge instructions. The generated discharge instructions were evaluated against 3,621 ground-truth references. We used BLEU-1 to BLEU-4, ROUGE (ROUGE-1, ROUGE-2, ROUGE-L), SentenceTransformer similarity, and BERTScore as evaluation metrics to assess the quality of the generated outputs in comparison to the corresponding ground-truth instructions for the same discharge summaries.

Results: The detailed instruction model demonstrated superior performance across all automated evaluation metrics compared with the generic instruction model. BERTScore increased from 78.92% to 87.05%, while structural alignment measured by ROUGE-L improved from 8.59% to 26.52%. N-gram precision (BLEU-4) increased from 0.81% to 21.24%, and METEOR scores rose from 15.33% to 18.47%. Additional metrics showed consistent gains: ROUGE-1 improved from 16.59% to 42.72%, and ROUGE-2 increased from 1.97% to 45.84%. All improvements were statistically significant (P < .001), indicating that detailed, task-specific instruction design substantially enhances model performance.

Conclusions: The use of detailed, task-specific instruction strategies significantly enhances the effectiveness of open-source large language models in generating discharge instructions. These findings indicate that carefully designed instructions during fine-tuning substantially improve model performance.

Clinicaltrial:

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊