Fine-tuning large language models for domain adaptation: exploration of training strategies, scaling, model merging and synergistic capabilities

IF 11.9 1区材料科学 Q1 CHEMISTRY, PHYSICAL

npj Computational Materials Pub Date : 2025-03-28 DOI:10.1038/s41524-025-01564-y

Wei Lu, Rachel K. Luu, Markus J. Buehler

{"title":"Fine-tuning large language models for domain adaptation: exploration of training strategies, scaling, model merging and synergistic capabilities","authors":"Wei Lu, Rachel K. Luu, Markus J. Buehler","doi":"10.1038/s41524-025-01564-y","DOIUrl":null,"url":null,"abstract":"<p>The advancement of Large Language Models (LLMs) for domain applications in fields such as materials science and engineering depends on the development of fine-tuning strategies that adapt models for specialized, technical capabilities. In this work, we explore the effects of Continued Pretraining (CPT), Supervised Fine-Tuning (SFT), and various preference-based optimization approaches, including Direct Preference Optimization (DPO) and Odds Ratio Preference Optimization (ORPO), on fine-tuned LLM performance. Our analysis shows how these strategies influence model outcomes and reveals that the merging of multiple fine-tuned models can lead to the emergence of capabilities that surpass the individual contributions of the parent models. We find that model merging is not merely a process of aggregation, but a transformative method that can drive substantial advancements in model capabilities characterized by highly nonlinear interactions between model parameters, resulting in new functionalities that neither parent model could achieve alone, leading to improved performance in domain-specific assessments. We study critical factors that influence the success of model merging, such as the diversity between parent models and the fine-tuning techniques employed. The insights underscore the potential of strategic model merging to unlock novel capabilities in LLMs, offering an effective tool for advancing AI systems to meet complex challenges. Experiments with different model architectures are presented, including the Llama 3.1 8B and Mistral 7B family of models, where similar behaviors are observed. Exploring whether the results hold also for much smaller models, we use a tiny LLM with 1.7 billion parameters and show that very small LLMs do not necessarily feature emergent capabilities under model merging, suggesting that model scaling may be a key component. In open-ended yet consistent chat conversations between a human and AI models, our assessment reveals detailed insights into how different model variants perform, and shows that the smallest model achieves a high intelligence score across key criteria including reasoning depth, creativity, clarity, and quantitative precision. Other experiments include the development of image generation prompts that seek to reason over disparate biological material design concepts, to create new microstructures, architectural concepts, and urban design based on biological materials-inspired construction principles. We conclude with a series of questions about scaling and emergence that could be addressed in future research.</p>","PeriodicalId":19342,"journal":{"name":"npj Computational Materials","volume":"183 1","pages":""},"PeriodicalIF":11.9000,"publicationDate":"2025-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"npj Computational Materials","FirstCategoryId":"88","ListUrlMain":"https://doi.org/10.1038/s41524-025-01564-y","RegionNum":1,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}

引用次数: 0

Abstract

The advancement of Large Language Models (LLMs) for domain applications in fields such as materials science and engineering depends on the development of fine-tuning strategies that adapt models for specialized, technical capabilities. In this work, we explore the effects of Continued Pretraining (CPT), Supervised Fine-Tuning (SFT), and various preference-based optimization approaches, including Direct Preference Optimization (DPO) and Odds Ratio Preference Optimization (ORPO), on fine-tuned LLM performance. Our analysis shows how these strategies influence model outcomes and reveals that the merging of multiple fine-tuned models can lead to the emergence of capabilities that surpass the individual contributions of the parent models. We find that model merging is not merely a process of aggregation, but a transformative method that can drive substantial advancements in model capabilities characterized by highly nonlinear interactions between model parameters, resulting in new functionalities that neither parent model could achieve alone, leading to improved performance in domain-specific assessments. We study critical factors that influence the success of model merging, such as the diversity between parent models and the fine-tuning techniques employed. The insights underscore the potential of strategic model merging to unlock novel capabilities in LLMs, offering an effective tool for advancing AI systems to meet complex challenges. Experiments with different model architectures are presented, including the Llama 3.1 8B and Mistral 7B family of models, where similar behaviors are observed. Exploring whether the results hold also for much smaller models, we use a tiny LLM with 1.7 billion parameters and show that very small LLMs do not necessarily feature emergent capabilities under model merging, suggesting that model scaling may be a key component. In open-ended yet consistent chat conversations between a human and AI models, our assessment reveals detailed insights into how different model variants perform, and shows that the smallest model achieves a high intelligence score across key criteria including reasoning depth, creativity, clarity, and quantitative precision. Other experiments include the development of image generation prompts that seek to reason over disparate biological material design concepts, to create new microstructures, architectural concepts, and urban design based on biological materials-inspired construction principles. We conclude with a series of questions about scaling and emergence that could be addressed in future research.

Abstract Image

查看原文本刊更多论文

微调领域适应的大型语言模型：探索训练策略，缩放，模型合并和协同能力

大型语言模型（llm）在材料科学和工程等领域的应用依赖于微调策略的发展，这些策略使模型适应专门的技术能力。在这项工作中，我们探讨了持续预训练（CPT），监督微调（SFT）和各种基于偏好的优化方法，包括直接偏好优化（DPO）和优势比偏好优化（ORPO），对微调LLM性能的影响。我们的分析显示了这些策略如何影响模型结果，并揭示了多个微调模型的合并可以导致超越父模型的单个贡献的能力的出现。我们发现模型合并不仅仅是一个聚合的过程，而是一种转化的方法，它可以推动模型能力的实质性进步，其特征是模型参数之间高度非线性的相互作用，从而产生父模型都不能单独实现的新功能，从而提高特定领域评估的性能。我们研究了影响模型合并成功的关键因素，如母模型之间的差异和所采用的微调技术。这些见解强调了战略模型合并在解锁法学硕士新功能方面的潜力，为推进人工智能系统应对复杂挑战提供了有效工具。在不同的模型架构下进行了实验，包括Llama 3.1 8B和Mistral 7B系列模型，在这些模型中观察到相似的行为。为了探索结果是否也适用于更小的模型，我们使用了一个具有17亿个参数的微小LLM，并表明非常小的LLM不一定具有模型合并下的紧急能力，这表明模型缩放可能是一个关键组成部分。在人类和人工智能模型之间的开放式但一致的聊天对话中，我们的评估揭示了对不同模型变体如何表现的详细见解，并表明最小的模型在包括推理深度，创造力，清晰度和定量精度在内的关键标准上实现了高智能得分。其他实验包括图像生成提示的发展，寻求在不同的生物材料设计概念中进行推理，创造新的微结构，建筑概念，以及基于生物材料启发的建筑原则的城市设计。我们总结了一系列关于规模和涌现的问题，这些问题可以在未来的研究中得到解决。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

npj Computational Materials Mathematics-Modeling and Simulation

CiteScore

15.30

自引率

5.20%

发文量

229

审稿时长

6 weeks

期刊介绍： npj Computational Materials is a high-quality open access journal from Nature Research that publishes research papers applying computational approaches for the design of new materials and enhancing our understanding of existing ones. The journal also welcomes papers on new computational techniques and the refinement of current approaches that support these aims, as well as experimental papers that complement computational findings. Some key features of npj Computational Materials include a 2-year impact factor of 12.241 (2021), article downloads of 1,138,590 (2021), and a fast turnaround time of 11 days from submission to the first editorial decision. The journal is indexed in various databases and services, including Chemical Abstracts Service (ACS), Astrophysics Data System (ADS), Current Contents/Physical, Chemical and Earth Sciences, Journal Citation Reports/Science Edition, SCOPUS, EI Compendex, INSPEC, Google Scholar, SCImago, DOAJ, CNKI, and Science Citation Index Expanded (SCIE), among others.