Small language models learn enhanced reasoning skills from medical textbooks

IF 12.4 1区 医学 Q1 HEALTH CARE SCIENCES & SERVICES
Hyunjae Kim, Hyeon Hwang, Jiwoo Lee, Sihyeon Park, Dain Kim, Taewhoo Lee, Chanwoong Yoon, Jiwoong Sohn, Jungwoo Park, Olga Reykhart, Thomas Fetherston, Donghee Choi, Soo Heon Kwak, Qingyu Chen, Jaewoo Kang
{"title":"Small language models learn enhanced reasoning skills from medical textbooks","authors":"Hyunjae Kim, Hyeon Hwang, Jiwoo Lee, Sihyeon Park, Dain Kim, Taewhoo Lee, Chanwoong Yoon, Jiwoong Sohn, Jungwoo Park, Olga Reykhart, Thomas Fetherston, Donghee Choi, Soo Heon Kwak, Qingyu Chen, Jaewoo Kang","doi":"10.1038/s41746-025-01653-8","DOIUrl":null,"url":null,"abstract":"<p>Small language models (SLM) offer promise for medical applications by addressing the privacy and hardware constraints of large language models; however, their limited parameters (often fewer than ten billion) hinder multi-step reasoning for complex medical tasks. This study presents Meerkat, a new family of medical SLMs designed to be lightweight while enhancing reasoning capabilities. We begin by designing an effective and efficient training method. This involves extracting high-quality chain-of-thought reasoning paths from 18 medical textbooks, which are then combined with diverse instruction-following datasets within the medical domain, totaling 441K training examples. Fine-tuning was conducted on open-source SLMs using this curated dataset. Our Meerkat-7B and Meerkat-8B models outperformed their counterparts by 22.3% and 10.6% across six exam datasets, respectively. They also improved scores on the NEJM Case Challenge from 7 to 16 and from 13 to 20, surpassing the human score of 13.7. Additionally, they demonstrated superiority in expert evaluations, excelling in all metrics—completeness, factuality, clarity, and logical consistency—of reasoning abilities.</p>","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"34 1","pages":""},"PeriodicalIF":12.4000,"publicationDate":"2025-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"NPJ Digital Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1038/s41746-025-01653-8","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0

Abstract

Small language models (SLM) offer promise for medical applications by addressing the privacy and hardware constraints of large language models; however, their limited parameters (often fewer than ten billion) hinder multi-step reasoning for complex medical tasks. This study presents Meerkat, a new family of medical SLMs designed to be lightweight while enhancing reasoning capabilities. We begin by designing an effective and efficient training method. This involves extracting high-quality chain-of-thought reasoning paths from 18 medical textbooks, which are then combined with diverse instruction-following datasets within the medical domain, totaling 441K training examples. Fine-tuning was conducted on open-source SLMs using this curated dataset. Our Meerkat-7B and Meerkat-8B models outperformed their counterparts by 22.3% and 10.6% across six exam datasets, respectively. They also improved scores on the NEJM Case Challenge from 7 to 16 and from 13 to 20, surpassing the human score of 13.7. Additionally, they demonstrated superiority in expert evaluations, excelling in all metrics—completeness, factuality, clarity, and logical consistency—of reasoning abilities.

Abstract Image

小语言模型从医学教科书中学习增强的推理技能
小型语言模型(SLM)通过解决大型语言模型的隐私和硬件限制,为医疗应用提供了希望;然而,它们有限的参数(通常小于100亿)阻碍了复杂医疗任务的多步推理。这项研究提出了Meerkat,一个新的医疗slm家族,旨在轻量级,同时增强推理能力。我们从设计一个有效的、高效的培训方法开始。这包括从18本医学教科书中提取高质量的思维链推理路径,然后将其与医学领域内的各种指令遵循数据集相结合,总共441K个训练示例。使用这个精心策划的数据集对开源slm进行了微调。我们的Meerkat-7B和Meerkat-8B模型在6个考试数据集上的表现分别比同类产品高出22.3%和10.6%。它们还将NEJM案例挑战的得分从7分提高到16分,从13分提高到20分,超过了人类的13.7分。此外,他们在专家评估中表现出优势,在推理能力的完整性、真实性、清晰度和逻辑一致性等所有指标上都表现出色。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
25.10
自引率
3.30%
发文量
170
审稿时长
15 weeks
期刊介绍: npj Digital Medicine is an online open-access journal that focuses on publishing peer-reviewed research in the field of digital medicine. The journal covers various aspects of digital medicine, including the application and implementation of digital and mobile technologies in clinical settings, virtual healthcare, and the use of artificial intelligence and informatics. The primary goal of the journal is to support innovation and the advancement of healthcare through the integration of new digital and mobile technologies. When determining if a manuscript is suitable for publication, the journal considers four important criteria: novelty, clinical relevance, scientific rigor, and digital innovation.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信