任务12:低资源语言中情感分类的多语言微调

International Workshop on Semantic Evaluation Pub Date : 2023-04-27 DOI:10.48550/arXiv.2304.14189

Egil Rønningstad

{"title":"任务12:低资源语言中情感分类的多语言微调","authors":"Egil Rønningstad","doi":"10.48550/arXiv.2304.14189","DOIUrl":null,"url":null,"abstract":"Our contribution to the 2023 AfriSenti-SemEval shared task 12: Sentiment Analysis for African Languages, provides insight into how a multilingual large language model can be a resource for sentiment analysis in languages not seen during pretraining. The shared task provides datasets of a variety of African languages from different language families. The languages are to various degrees related to languages used during pretraining, and the language data contain various degrees of code-switching. We experiment with both monolingual and multilingual datasets for the final fine-tuning, and find that with the provided datasets that contain samples in the thousands, monolingual fine-tuning yields the best results.","PeriodicalId":444285,"journal":{"name":"International Workshop on Semantic Evaluation","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"UIO at SemEval-2023 Task 12: Multilingual fine-tuning for sentiment classification in low-resource Languages\",\"authors\":\"Egil Rønningstad\",\"doi\":\"10.48550/arXiv.2304.14189\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Our contribution to the 2023 AfriSenti-SemEval shared task 12: Sentiment Analysis for African Languages, provides insight into how a multilingual large language model can be a resource for sentiment analysis in languages not seen during pretraining. The shared task provides datasets of a variety of African languages from different language families. The languages are to various degrees related to languages used during pretraining, and the language data contain various degrees of code-switching. We experiment with both monolingual and multilingual datasets for the final fine-tuning, and find that with the provided datasets that contain samples in the thousands, monolingual fine-tuning yields the best results.\",\"PeriodicalId\":444285,\"journal\":{\"name\":\"International Workshop on Semantic Evaluation\",\"volume\":\"6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-04-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Workshop on Semantic Evaluation\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2304.14189\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Workshop on Semantic Evaluation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2304.14189","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

我们对2023年afrisentit - semeval共享任务12的贡献:非洲语言的情感分析，提供了对多语言大型语言模型如何成为预训练中未见的语言情感分析资源的见解。共享任务提供了来自不同语系的各种非洲语言的数据集。这些语言在不同程度上与预训练中使用的语言相关，并且语言数据包含不同程度的代码切换。我们对单语言和多语言数据集进行了实验，以进行最终的微调，并发现对于包含数千个样本的数据集，单语言微调产生了最好的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

UIO at SemEval-2023 Task 12: Multilingual fine-tuning for sentiment classification in low-resource Languages

Our contribution to the 2023 AfriSenti-SemEval shared task 12: Sentiment Analysis for African Languages, provides insight into how a multilingual large language model can be a resource for sentiment analysis in languages not seen during pretraining. The shared task provides datasets of a variety of African languages from different language families. The languages are to various degrees related to languages used during pretraining, and the language data contain various degrees of code-switching. We experiment with both monolingual and multilingual datasets for the final fine-tuning, and find that with the provided datasets that contain samples in the thousands, monolingual fine-tuning yields the best results.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Workshop on Semantic Evaluation

自引率

0.00%

发文量