Assessing language models’ task and language transfer capabilities for sentiment analysis in dialog data

IF 3.1 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computer Speech and Language Pub Date : 2024-07-31 DOI:10.1016/j.csl.2024.101704

Vlad-Andrei Negru, Vasile Suciu, Alex-Mihai Lăpuşan, Camelia Lemnaru, Mihaela Dînşoreanu, Rodica Potolea

{"title":"Assessing language models’ task and language transfer capabilities for sentiment analysis in dialog data","authors":"Vlad-Andrei Negru, Vasile Suciu, Alex-Mihai Lăpuşan, Camelia Lemnaru, Mihaela Dînşoreanu, Rodica Potolea","doi":"10.1016/j.csl.2024.101704","DOIUrl":null,"url":null,"abstract":"<div><p>Our work explores the differences between GRU-based and transformer-based approaches in the context of sentiment analysis on text dialog. In addition to the overall performance on the downstream task, we assess the knowledge transfer capabilities of the models by applying a thorough zero-shot analysis at task level, and on the cross-lingual performance between five European languages. The ability to generalize over different tasks and languages is of high importance, as the data needed for a particular application may be scarce or non existent. We perform evaluations on both known benchmark datasets and a novel synthetic dataset for dialog data, containing Romanian call-center conversations. We study the most appropriate combination of synthetic and real data for fine-tuning on the downstream task, enabling our models to perform in low-resource environments. We leverage the informative power of the conversational context, showing that appending the previous four utterances of the same speaker to the input sequence has the greatest benefit on the inference performance. The cross-lingual and cross-task evaluations have shown that the transformer-based models possess superior transfer abilities to the GRU model, especially in the zero-shot setting. Considering its prior intensive fine-tuning on multiple labeled datasets for various tasks, FLAN-T5 excels in the zero-shot task experiments, obtaining a zero-shot accuracy of 51.27% on the IEMOCAP dataset, alongside the classical BERT that obtained the highest zero-shot accuracy on the MELD dataset with 55.08%.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"89 ","pages":"Article 101704"},"PeriodicalIF":3.1000,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000871/pdfft?md5=a2ab3e37131135c69cec0ed9bbef500a&pid=1-s2.0-S0885230824000871-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Speech and Language","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0885230824000871","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Our work explores the differences between GRU-based and transformer-based approaches in the context of sentiment analysis on text dialog. In addition to the overall performance on the downstream task, we assess the knowledge transfer capabilities of the models by applying a thorough zero-shot analysis at task level, and on the cross-lingual performance between five European languages. The ability to generalize over different tasks and languages is of high importance, as the data needed for a particular application may be scarce or non existent. We perform evaluations on both known benchmark datasets and a novel synthetic dataset for dialog data, containing Romanian call-center conversations. We study the most appropriate combination of synthetic and real data for fine-tuning on the downstream task, enabling our models to perform in low-resource environments. We leverage the informative power of the conversational context, showing that appending the previous four utterances of the same speaker to the input sequence has the greatest benefit on the inference performance. The cross-lingual and cross-task evaluations have shown that the transformer-based models possess superior transfer abilities to the GRU model, especially in the zero-shot setting. Considering its prior intensive fine-tuning on multiple labeled datasets for various tasks, FLAN-T5 excels in the zero-shot task experiments, obtaining a zero-shot accuracy of 51.27% on the IEMOCAP dataset, alongside the classical BERT that obtained the highest zero-shot accuracy on the MELD dataset with 55.08%.

查看原文本刊更多论文

评估语言模型在对话数据情感分析中的任务和语言转换能力

我们的研究探索了基于 GRU 的方法和基于转换器的方法在文本对话情感分析中的差异。除了下游任务的整体性能外，我们还通过在任务层面上应用全面的零点分析，以及五种欧洲语言之间的跨语言性能，评估了模型的知识转移能力。由于特定应用所需的数据可能稀缺或不存在，因此对不同任务和语言进行泛化的能力非常重要。我们在已知的基准数据集和包含罗马尼亚语呼叫中心对话的新型合成对话数据集上进行了评估。我们研究了合成数据和真实数据的最合适组合，以便对下游任务进行微调，使我们的模型能够在资源匮乏的环境中运行。我们充分利用了对话上下文的信息力量，结果表明，在输入序列中附加同一说话者的前四句话对推理性能有最大的好处。跨语言和跨任务评估表明，基于转换器的模型比 GRU 模型具有更强的转换能力，尤其是在零镜头环境下。考虑到 FLAN-T5 之前针对不同任务在多个标注数据集上进行的密集微调，它在零点任务实验中表现出色，在 IEMOCAP 数据集上获得了 51.27% 的零点准确率，与经典 BERT 在 MELD 数据集上获得的 55.08% 的最高零点准确率并驾齐驱。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computer Speech and Language 工程技术-计算机：人工智能

CiteScore

11.30

自引率

4.70%

发文量

审稿时长

22.9 weeks

期刊介绍： Computer Speech & Language publishes reports of original research related to the recognition, understanding, production, coding and mining of speech and language. The speech and language sciences have a long history, but it is only relatively recently that large-scale implementation of and experimentation with complex models of speech and language processing has become feasible. Such research is often carried out somewhat separately by practitioners of artificial intelligence, computer science, electronic engineering, information retrieval, linguistics, phonetics, or psychology.