When does Further Pre-training MLM Help? An Empirical Study on Task-Oriented Dialog Pre-training

Proceedings of the Second Workshop on Insights from Negative Results in NLP Pub Date : 1900-01-01 DOI:10.18653/v1/2021.insights-1.9

Qi Zhu, Yuxian Gu, Lingxiao Luo, Bing Li, Cheng Li, Wei Peng, Minlie Huang, Xiaoyan Zhu

引用次数: 10

Abstract

Further pre-training language models on in-domain data (domain-adaptive pre-training, DAPT) or task-relevant data (task-adaptive pre-training, TAPT) before fine-tuning has been shown to improve downstream tasks’ performances. However, in task-oriented dialog modeling, we observe that further pre-training MLM does not always boost the performance on a downstream task. We find that DAPT is beneficial in the low-resource setting, but as the fine-tuning data size grows, DAPT becomes less beneficial or even useless, and scaling the size of DAPT data does not help. Through Representational Similarity Analysis, we conclude that more data for fine-tuning yields greater change of the model’s representations and thus reduces the influence of initialization.

查看原文本刊更多论文

进一步的预培训对传销有什么帮助?任务导向对话预训练的实证研究

在微调之前，在域内数据(域自适应预训练，DAPT)或任务相关数据(任务自适应预训练，TAPT)上进一步预训练语言模型可以提高下游任务的性能。然而，在面向任务的对话建模中，我们观察到进一步的预训练MLM并不总是提高下游任务的性能。我们发现DAPT在低资源设置中是有益的，但是随着微调数据大小的增长，DAPT变得不那么有益甚至无用，并且扩展DAPT数据的大小也没有帮助。通过表征相似度分析，我们得出结论，更多的数据用于微调产生更大的模型表征变化，从而减少初始化的影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the Second Workshop on Insights from Negative Results in NLP

自引率

0.00%

发文量