{"title":"AdaFT: An efficient domain-adaptive fine-tuning framework for sentiment analysis in chinese financial texts","authors":"Guofeng Yan, Kuashuai Peng, Yongfeng Wang, Hengliang Tan, Jiao Du, Heng Wu","doi":"10.1007/s10489-025-06578-z","DOIUrl":null,"url":null,"abstract":"<div><p>Given the prevalence of pre-trained language models (PLMs) within the field of natural language processing, it has become evident that the conventional two-stage approach of <i>‘pre-training’</i>-then-<i>‘fine-tuning’</i> consistently yields commendable outcomes. Nevertheless, most publicly accessible PLMs are pre-trained on extensive, general-purpose datasets, thereby failing to address the substantial domain dissimilarity between the source and target data. This discrepancy has significant implications for the adaptability of PLMs to specific domains. To address this issue, our study proposes AdaFT, an efficient domain-adaptive fine-tuning framework that seeks to enhance the traditional fine-tuning process, thus bridging the gap between the source and target domains. This is particularly beneficial for enabling PLMs to better align with the specialized context of the Chinese financial domain. In contrast to the standard two-stage paradigm, AdaFT incorporates two additional stages: <i> ’multi-task further pre-training’</i> and <i> ’multi-model parameter fusion.’</i> In the first phase, the PLM undergoes a rapid, multi-task, parallel learning process, which effectively augments its proficiency in Chinese financial domain-related tasks. In the subsequent stage, we introduce an adaptive multi-model parameter fusion (AdaMFusion) strategy to amalgamate the knowledge acquired from the extended pre-training. To efficiently allocate weights for AdaMFusion, we have developed a local search algorithm with a decreasing step length, i.e., Local Search with Decreasing Step size (LSDS). The combination of AdaMFusion and LSDS algorithm strikes a balance between efficiency and performance, making it suitable for most scenarios. We also find that the optimal weighting factor assigned to a model to be fused is positively correlated with the performance improvement of that model on the target task after further pre-training. We demonstrate that further pre-training is generally effective, and further pre-training on domain-relevant corpora is more effective than on task-relevant corpora. Our extensive experiments, utilizing BERT (Bidirectional Encoder Representations from Transformers) as an illustrative example, indicate that Chinese BERT-base trained under the AdaFT framework attains an accuracy rate of 94.95% in the target task, marking a substantial 3.12% enhancement when compared to the conventional fine-tuning approach. Furthermore, our results demonstrate that AdaFT remains effective when applied to BERT-based variants, such as Chinese ALBERT-base.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 10","pages":""},"PeriodicalIF":3.4000,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Intelligence","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10489-025-06578-z","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Given the prevalence of pre-trained language models (PLMs) within the field of natural language processing, it has become evident that the conventional two-stage approach of ‘pre-training’-then-‘fine-tuning’ consistently yields commendable outcomes. Nevertheless, most publicly accessible PLMs are pre-trained on extensive, general-purpose datasets, thereby failing to address the substantial domain dissimilarity between the source and target data. This discrepancy has significant implications for the adaptability of PLMs to specific domains. To address this issue, our study proposes AdaFT, an efficient domain-adaptive fine-tuning framework that seeks to enhance the traditional fine-tuning process, thus bridging the gap between the source and target domains. This is particularly beneficial for enabling PLMs to better align with the specialized context of the Chinese financial domain. In contrast to the standard two-stage paradigm, AdaFT incorporates two additional stages: ’multi-task further pre-training’ and ’multi-model parameter fusion.’ In the first phase, the PLM undergoes a rapid, multi-task, parallel learning process, which effectively augments its proficiency in Chinese financial domain-related tasks. In the subsequent stage, we introduce an adaptive multi-model parameter fusion (AdaMFusion) strategy to amalgamate the knowledge acquired from the extended pre-training. To efficiently allocate weights for AdaMFusion, we have developed a local search algorithm with a decreasing step length, i.e., Local Search with Decreasing Step size (LSDS). The combination of AdaMFusion and LSDS algorithm strikes a balance between efficiency and performance, making it suitable for most scenarios. We also find that the optimal weighting factor assigned to a model to be fused is positively correlated with the performance improvement of that model on the target task after further pre-training. We demonstrate that further pre-training is generally effective, and further pre-training on domain-relevant corpora is more effective than on task-relevant corpora. Our extensive experiments, utilizing BERT (Bidirectional Encoder Representations from Transformers) as an illustrative example, indicate that Chinese BERT-base trained under the AdaFT framework attains an accuracy rate of 94.95% in the target task, marking a substantial 3.12% enhancement when compared to the conventional fine-tuning approach. Furthermore, our results demonstrate that AdaFT remains effective when applied to BERT-based variants, such as Chinese ALBERT-base.
期刊介绍:
With a focus on research in artificial intelligence and neural networks, this journal addresses issues involving solutions of real-life manufacturing, defense, management, government and industrial problems which are too complex to be solved through conventional approaches and require the simulation of intelligent thought processes, heuristics, applications of knowledge, and distributed and parallel processing. The integration of these multiple approaches in solving complex problems is of particular importance.
The journal presents new and original research and technological developments, addressing real and complex issues applicable to difficult problems. It provides a medium for exchanging scientific research and technological achievements accomplished by the international community.