Multi-Task Learning in Natural Language Processing: An Overview

IF 23.8 1区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

ACM Computing Surveys Pub Date : 2024-05-11 DOI:10.1145/3663363

Shijie Chen, Yu Zhang, Qiang Yang

引用次数: 0

Abstract

Deep learning approaches have achieved great success in the field of Natural Language Processing (NLP). However, directly training deep neural models often suffer from overfitting and data scarcity problems that are pervasive in NLP tasks. In recent years, Multi-Task Learning (MTL), which can leverage useful information of related tasks to achieve simultaneous performance improvement on these tasks, has been used to handle these problems. In this paper, we give an overview of the use of MTL in NLP tasks. We first review MTL architectures used in NLP tasks and categorize them into four classes, including parallel architecture, hierarchical architecture, modular architecture, and generative adversarial architecture. Then we present optimization techniques on loss construction, gradient regularization, data sampling, and task scheduling to properly train a multi-task model. After presenting applications of MTL in a variety of NLP tasks, we introduce some benchmark datasets. Finally, we make a conclusion and discuss several possible research directions in this field.

查看原文本刊更多论文

自然语言处理中的多任务学习：概述

深度学习方法在自然语言处理（NLP）领域取得了巨大成功。然而，直接训练深度神经模型往往会遇到过拟合和数据稀缺的问题，而这些问题在 NLP 任务中普遍存在。近年来，多任务学习（Multi-Task Learning，MTL）被用来处理这些问题，它可以利用相关任务的有用信息，实现这些任务性能的同步提升。本文概述了 MTL 在 NLP 任务中的应用。我们首先回顾了在 NLP 任务中使用的 MTL 架构，并将其分为四类，包括并行架构、分层架构、模块化架构和生成式对抗架构。然后，我们介绍了损失构建、梯度正则化、数据采样和任务调度方面的优化技术，以正确训练多任务模型。在介绍了 MTL 在各种 NLP 任务中的应用后，我们介绍了一些基准数据集。最后，我们做出结论，并讨论了该领域可能的几个研究方向。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Computing Surveys 工程技术-计算机：理论方法

CiteScore

33.20

自引率

0.60%

发文量

372

审稿时长

12 months

期刊介绍： ACM Computing Surveys is an academic journal that focuses on publishing surveys and tutorials on various areas of computing research and practice. The journal aims to provide comprehensive and easily understandable articles that guide readers through the literature and help them understand topics outside their specialties. In terms of impact, CSUR has a high reputation with a 2022 Impact Factor of 16.6. It is ranked 3rd out of 111 journals in the field of Computer Science Theory & Methods. ACM Computing Surveys is indexed and abstracted in various services, including AI2 Semantic Scholar, Baidu, Clarivate/ISI: JCR, CNKI, DeepDyve, DTU, EBSCO: EDS/HOST, and IET Inspec, among others.