通过一致性校准为特定任务嵌入生成塑造预训练语言模型

IF 6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Networks Pub Date : 2025-06-21 DOI:10.1016/j.neunet.2025.107754

Jianqi Gao , Hang Yu , Yiu-ming Cheung , Jian Cao , Raymond Chi-Wing Wong , Yonggang Zhang

{"title":"通过一致性校准为特定任务嵌入生成塑造预训练语言模型","authors":"Jianqi Gao , Hang Yu , Yiu-ming Cheung , Jian Cao , Raymond Chi-Wing Wong , Yonggang Zhang","doi":"10.1016/j.neunet.2025.107754","DOIUrl":null,"url":null,"abstract":"<div><div>Pre-trained language models (PLMs) have shown significant success in various downstream tasks by providing initial parameters for task-specific fine-tuning. An inherent challenge of this approach is that adapting solely to downstream tasks may lead to the forgetting of pre-trained knowledge, resulting in limited fine-tuning performance on downstream tasks. To tackle this challenge, we propose a novel approach called EGO-PLM, where PLMs serve as task-specific <u>e</u>mbedding <u>g</u>enerat<u>o</u>r. The underlying insight of EGO-PLM is to align the fine-tuning tasks for PLMs with those utilized during the pre-training phase. Within this framework, we design a task-agnostic pre-defined task that is similar to the pre-training phase and a task-specific embedding generator to adapt to specific tasks, enabling the specific task can be trained jointly with the pre-defined task. To alleviate task conflicts between pre-defined and task-specific tasks and make sure the generated embedding are task-specific, we propose <em>co</em>nsistency <em>ca</em>libration (CoCa), which aligns the pre-defined objectives with the task-specific ones. Specifically, CoCa identifies inconsistencies between the pre-defined and task-specific objectives in an adversarial manner, subsequently calibrating these disparities through adversarial training. We validate the effectiveness of EGO-PLM using <strong>8</strong> datasets across <strong>6</strong> task categories, demonstrating consistent and substantial improvements compared to state-of-the-art baselines.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"191 ","pages":"Article 107754"},"PeriodicalIF":6.0000,"publicationDate":"2025-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Shaping pre-trained language models for task-specific embedding generation via consistency calibration\",\"authors\":\"Jianqi Gao , Hang Yu , Yiu-ming Cheung , Jian Cao , Raymond Chi-Wing Wong , Yonggang Zhang\",\"doi\":\"10.1016/j.neunet.2025.107754\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Pre-trained language models (PLMs) have shown significant success in various downstream tasks by providing initial parameters for task-specific fine-tuning. An inherent challenge of this approach is that adapting solely to downstream tasks may lead to the forgetting of pre-trained knowledge, resulting in limited fine-tuning performance on downstream tasks. To tackle this challenge, we propose a novel approach called EGO-PLM, where PLMs serve as task-specific <u>e</u>mbedding <u>g</u>enerat<u>o</u>r. The underlying insight of EGO-PLM is to align the fine-tuning tasks for PLMs with those utilized during the pre-training phase. Within this framework, we design a task-agnostic pre-defined task that is similar to the pre-training phase and a task-specific embedding generator to adapt to specific tasks, enabling the specific task can be trained jointly with the pre-defined task. To alleviate task conflicts between pre-defined and task-specific tasks and make sure the generated embedding are task-specific, we propose <em>co</em>nsistency <em>ca</em>libration (CoCa), which aligns the pre-defined objectives with the task-specific ones. Specifically, CoCa identifies inconsistencies between the pre-defined and task-specific objectives in an adversarial manner, subsequently calibrating these disparities through adversarial training. We validate the effectiveness of EGO-PLM using <strong>8</strong> datasets across <strong>6</strong> task categories, demonstrating consistent and substantial improvements compared to state-of-the-art baselines.</div></div>\",\"PeriodicalId\":49763,\"journal\":{\"name\":\"Neural Networks\",\"volume\":\"191 \",\"pages\":\"Article 107754\"},\"PeriodicalIF\":6.0000,\"publicationDate\":\"2025-06-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neural Networks\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0893608025006343\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0893608025006343","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

预训练语言模型（PLMs）通过为特定于任务的微调提供初始参数，在各种下游任务中取得了显著的成功。这种方法的一个固有挑战是，仅仅适应下游任务可能会导致忘记预先训练的知识，从而导致下游任务的微调性能受到限制。为了应对这一挑战，我们提出了一种名为EGO-PLM的新方法，其中plm作为特定任务的嵌入生成器。EGO-PLM的基本见解是将plm的微调任务与预训练阶段使用的任务保持一致。在此框架内，我们设计了一个类似于预训练阶段的任务不可知的预定义任务和一个适应特定任务的任务特定嵌入生成器，使特定任务可以与预定义任务联合训练。为了缓解预定义任务和特定任务之间的冲突，并确保生成的嵌入是特定于任务的，我们提出了一致性校准（CoCa），将预定义目标与特定任务的目标保持一致。具体来说，CoCa以对抗的方式识别预定义目标和特定任务目标之间的不一致性，随后通过对抗训练校准这些差异。我们使用跨6个任务类别的8个数据集验证EGO-PLM的有效性，与最先进的基线相比，显示出一致和实质性的改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Shaping pre-trained language models for task-specific embedding generation via consistency calibration

Pre-trained language models (PLMs) have shown significant success in various downstream tasks by providing initial parameters for task-specific fine-tuning. An inherent challenge of this approach is that adapting solely to downstream tasks may lead to the forgetting of pre-trained knowledge, resulting in limited fine-tuning performance on downstream tasks. To tackle this challenge, we propose a novel approach called EGO-PLM, where PLMs serve as task-specific embedding generator. The underlying insight of EGO-PLM is to align the fine-tuning tasks for PLMs with those utilized during the pre-training phase. Within this framework, we design a task-agnostic pre-defined task that is similar to the pre-training phase and a task-specific embedding generator to adapt to specific tasks, enabling the specific task can be trained jointly with the pre-defined task. To alleviate task conflicts between pre-defined and task-specific tasks and make sure the generated embedding are task-specific, we propose consistency calibration (CoCa), which aligns the pre-defined objectives with the task-specific ones. Specifically, CoCa identifies inconsistencies between the pre-defined and task-specific objectives in an adversarial manner, subsequently calibrating these disparities through adversarial training. We validate the effectiveness of EGO-PLM using 8 datasets across 6 task categories, demonstrating consistent and substantial improvements compared to state-of-the-art baselines.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Neural Networks 工程技术-计算机：人工智能

CiteScore

13.90

自引率

7.70%

发文量

425

审稿时长

67 days

期刊介绍： Neural Networks is a platform that aims to foster an international community of scholars and practitioners interested in neural networks, deep learning, and other approaches to artificial intelligence and machine learning. Our journal invites submissions covering various aspects of neural networks research, from computational neuroscience and cognitive modeling to mathematical analyses and engineering applications. By providing a forum for interdisciplinary discussions between biology and technology, we aim to encourage the development of biologically-inspired artificial intelligence.