合并数据集用于情感分析

2021 36th IEEE/ACM International Conference on Automated Software Engineering Workshops (ASEW) Pub Date : 2021-11-01 DOI:10.1109/ASEW52652.2021.00051

Ariadna de Arriba, M. Oriol, Xavier Franch

{"title":"合并数据集用于情感分析","authors":"Ariadna de Arriba, M. Oriol, Xavier Franch","doi":"10.1109/ASEW52652.2021.00051","DOIUrl":null,"url":null,"abstract":"Context. Applying sentiment analysis is in general a laborious task. Furthermore, if we add the task of getting a good quality dataset with balanced distribution and enough samples, the job becomes more complicated. Objective. We want to find out whether merging compatible datasets improves emotion analysis based on machine learning (ML) techniques, compared to the original, individual datasets. Method. We obtained two datasets with Covid-19-related tweets written in Spanish, and then built from them two new datasets combining the original ones with different consolidation of balance. We analyzed the results according to precision, recall, F1-score and accuracy. Results. The results obtained show that merging two datasets can improve the performance of ML models, particularly the F1-score, when the merging process follows a strategy that optimizes the balance of the resulting dataset. Conclusions. Merging two datasets can improve the performance of ML models for emotion analysis, whilst saving resources for labeling training data. This might be especially useful for several software engineering activities that leverage on ML-based emotion analysis techniques.","PeriodicalId":349977,"journal":{"name":"2021 36th IEEE/ACM International Conference on Automated Software Engineering Workshops (ASEW)","volume":"87 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Merging Datasets for Emotion Analysis\",\"authors\":\"Ariadna de Arriba, M. Oriol, Xavier Franch\",\"doi\":\"10.1109/ASEW52652.2021.00051\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Context. Applying sentiment analysis is in general a laborious task. Furthermore, if we add the task of getting a good quality dataset with balanced distribution and enough samples, the job becomes more complicated. Objective. We want to find out whether merging compatible datasets improves emotion analysis based on machine learning (ML) techniques, compared to the original, individual datasets. Method. We obtained two datasets with Covid-19-related tweets written in Spanish, and then built from them two new datasets combining the original ones with different consolidation of balance. We analyzed the results according to precision, recall, F1-score and accuracy. Results. The results obtained show that merging two datasets can improve the performance of ML models, particularly the F1-score, when the merging process follows a strategy that optimizes the balance of the resulting dataset. Conclusions. Merging two datasets can improve the performance of ML models for emotion analysis, whilst saving resources for labeling training data. This might be especially useful for several software engineering activities that leverage on ML-based emotion analysis techniques.\",\"PeriodicalId\":349977,\"journal\":{\"name\":\"2021 36th IEEE/ACM International Conference on Automated Software Engineering Workshops (ASEW)\",\"volume\":\"87 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 36th IEEE/ACM International Conference on Automated Software Engineering Workshops (ASEW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASEW52652.2021.00051\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 36th IEEE/ACM International Conference on Automated Software Engineering Workshops (ASEW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASEW52652.2021.00051","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

上下文。应用情感分析通常是一项费力的任务。此外，如果我们增加了获得具有平衡分布和足够样本的高质量数据集的任务，则工作变得更加复杂。目标。我们想知道与原始的单个数据集相比，合并兼容的数据集是否可以改善基于机器学习(ML)技术的情感分析。方法。我们获得了两个用西班牙语写的与covid -19相关的推文数据集，然后在它们的基础上结合不同整合平衡的原始数据集构建了两个新的数据集。我们从查准率、查全率、f1分和查准率四个方面对结果进行分析。结果。结果表明，当合并过程遵循优化结果数据集平衡的策略时，合并两个数据集可以提高ML模型的性能，特别是F1-score。结论。合并两个数据集可以提高ML模型用于情感分析的性能，同时节省标记训练数据的资源。这对于一些利用基于ml的情感分析技术的软件工程活动可能特别有用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Merging Datasets for Emotion Analysis

Context. Applying sentiment analysis is in general a laborious task. Furthermore, if we add the task of getting a good quality dataset with balanced distribution and enough samples, the job becomes more complicated. Objective. We want to find out whether merging compatible datasets improves emotion analysis based on machine learning (ML) techniques, compared to the original, individual datasets. Method. We obtained two datasets with Covid-19-related tweets written in Spanish, and then built from them two new datasets combining the original ones with different consolidation of balance. We analyzed the results according to precision, recall, F1-score and accuracy. Results. The results obtained show that merging two datasets can improve the performance of ML models, particularly the F1-score, when the merging process follows a strategy that optimizes the balance of the resulting dataset. Conclusions. Merging two datasets can improve the performance of ML models for emotion analysis, whilst saving resources for labeling training data. This might be especially useful for several software engineering activities that leverage on ML-based emotion analysis techniques.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 36th IEEE/ACM International Conference on Automated Software Engineering Workshops (ASEW)

自引率

0.00%

发文量