基于上下文嵌入的小数据文本分类任务迁移学习状态实证分析

Anais do 14. Congresso Brasileiro de Inteligência Computacional Pub Date : 2020-01-01 DOI:10.21528/cbic2019-82

F. Carvalho, C. Castro

{"title":"基于上下文嵌入的小数据文本分类任务迁移学习状态实证分析","authors":"F. Carvalho, C. Castro","doi":"10.21528/cbic2019-82","DOIUrl":null,"url":null,"abstract":"Recent developments in the NLP (Natural Language Processing) field have shown that deep transformer based language model architectures trained on a large corpus of unlabeled data are able to transfer knowledge to downstream tasks efficiently through fine-tuning. In particular, BERT and XLNet have shown impressive results, achieving state of the art performance in many tasks through this process. This is partially due to the ability these models have to create better representations of text in the form of contextual embeddings. However not much has been explored in the literature about the robustness of the transfer learning process of these models on a small data scenario. Also not a lot of effort has been put on analysing the behaviour of the two models fine-tuning process with different amounts of training data available. This paper addresses these questions through an empirical evaluation of these models on some datasets when finetuned on progressively smaller fractions of training data, for the task of text classification. It is shown that BERT and XLNet perform well with small data and can achieve good performance with very few labels available, in most cases. Results yielded with varying fractions of training data indicate that few examples are necessary in order to fine-tune the models and, although there is a positive effect in training with more labeled data, using only a subset of data is already enough to achieve a comparable performance with traditional non-deep learning models trained with substantially more data. Also it is noticeable how quickly the transfer learning curve of these methods saturate, reinforcing their ability to perform well with less data available. Keywords—Small data, text classification, NLP, contextual embeddings, representation learning, deep learning","PeriodicalId":160474,"journal":{"name":"Anais do 14. Congresso Brasileiro de Inteligência Computacional","volume":"69 2","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Empirical Analysis on the State of Transfer Learning for Small Data Text Classification Tasks Using Contextual Embeddings\",\"authors\":\"F. Carvalho, C. Castro\",\"doi\":\"10.21528/cbic2019-82\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recent developments in the NLP (Natural Language Processing) field have shown that deep transformer based language model architectures trained on a large corpus of unlabeled data are able to transfer knowledge to downstream tasks efficiently through fine-tuning. In particular, BERT and XLNet have shown impressive results, achieving state of the art performance in many tasks through this process. This is partially due to the ability these models have to create better representations of text in the form of contextual embeddings. However not much has been explored in the literature about the robustness of the transfer learning process of these models on a small data scenario. Also not a lot of effort has been put on analysing the behaviour of the two models fine-tuning process with different amounts of training data available. This paper addresses these questions through an empirical evaluation of these models on some datasets when finetuned on progressively smaller fractions of training data, for the task of text classification. It is shown that BERT and XLNet perform well with small data and can achieve good performance with very few labels available, in most cases. Results yielded with varying fractions of training data indicate that few examples are necessary in order to fine-tune the models and, although there is a positive effect in training with more labeled data, using only a subset of data is already enough to achieve a comparable performance with traditional non-deep learning models trained with substantially more data. Also it is noticeable how quickly the transfer learning curve of these methods saturate, reinforcing their ability to perform well with less data available. Keywords—Small data, text classification, NLP, contextual embeddings, representation learning, deep learning\",\"PeriodicalId\":160474,\"journal\":{\"name\":\"Anais do 14. Congresso Brasileiro de Inteligência Computacional\",\"volume\":\"69 2\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Anais do 14. Congresso Brasileiro de Inteligência Computacional\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.21528/cbic2019-82\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Anais do 14. Congresso Brasileiro de Inteligência Computacional","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21528/cbic2019-82","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

NLP(自然语言处理)领域的最新发展表明，在大量未标记数据的语料库上训练的基于深度转换器的语言模型体系结构能够通过微调有效地将知识转移到下游任务。特别是，BERT和XLNet已经显示出令人印象深刻的结果，通过这个过程在许多任务中实现了最先进的性能。这部分是由于这些模型有能力以上下文嵌入的形式创建更好的文本表示。然而，文献中关于这些模型在小数据场景下迁移学习过程的鲁棒性的探讨并不多。此外，在使用不同数量的可用训练数据的情况下，分析两种模型微调过程的行为也没有付出很多努力。本文通过对这些模型在一些数据集上的经验评估来解决这些问题，当这些模型在逐渐减少的训练数据部分上进行微调时，用于文本分类任务。结果表明，BERT和XLNet在处理小数据时表现良好，并且在大多数情况下，在可用标签很少的情况下也能获得良好的性能。使用不同比例的训练数据产生的结果表明，为了对模型进行微调，需要的示例很少，尽管使用更多标记数据进行训练具有积极效果，但仅使用数据子集就足以实现与使用更多数据训练的传统非深度学习模型相当的性能。同样值得注意的是，这些方法的迁移学习曲线饱和的速度有多快，这增强了它们在可用数据较少的情况下表现良好的能力。关键词:小数据，文本分类，自然语言处理，上下文嵌入，表示学习，深度学习

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Empirical Analysis on the State of Transfer Learning for Small Data Text Classification Tasks Using Contextual Embeddings

Recent developments in the NLP (Natural Language Processing) field have shown that deep transformer based language model architectures trained on a large corpus of unlabeled data are able to transfer knowledge to downstream tasks efficiently through fine-tuning. In particular, BERT and XLNet have shown impressive results, achieving state of the art performance in many tasks through this process. This is partially due to the ability these models have to create better representations of text in the form of contextual embeddings. However not much has been explored in the literature about the robustness of the transfer learning process of these models on a small data scenario. Also not a lot of effort has been put on analysing the behaviour of the two models fine-tuning process with different amounts of training data available. This paper addresses these questions through an empirical evaluation of these models on some datasets when finetuned on progressively smaller fractions of training data, for the task of text classification. It is shown that BERT and XLNet perform well with small data and can achieve good performance with very few labels available, in most cases. Results yielded with varying fractions of training data indicate that few examples are necessary in order to fine-tune the models and, although there is a positive effect in training with more labeled data, using only a subset of data is already enough to achieve a comparable performance with traditional non-deep learning models trained with substantially more data. Also it is noticeable how quickly the transfer learning curve of these methods saturate, reinforcing their ability to perform well with less data available. Keywords—Small data, text classification, NLP, contextual embeddings, representation learning, deep learning

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Anais do 14. Congresso Brasileiro de Inteligência Computacional

自引率

0.00%

发文量