机器学习模型的泛化:阿拉伯语情感分析的案例研究

International Conference on Web and Social Media Pub Date : 2023-06-02 DOI:10.1609/icwsm.v17i1.22204

Samir Abdaljalil, S. Hassanein, Hamdy Mubarak, Ahmed Abdelali

{"title":"机器学习模型的泛化:阿拉伯语情感分析的案例研究","authors":"Samir Abdaljalil, S. Hassanein, Hamdy Mubarak, Ahmed Abdelali","doi":"10.1609/icwsm.v17i1.22204","DOIUrl":null,"url":null,"abstract":"The abundance of social media data in the Arab world, specifically on Twitter, enabled companies and entities to exploit such rich and beneficial data that could be mined and used to extract important information, including sentiments and opinions of people towards a topic or a merchandise. However, with this plenitude comes the issue of producing models that are able to deliver consistent outcomes when tested within various contexts. Although model generalization has been thoroughly investigated in many fields, it has not been heavily investigated in the Arabic context. To address this gap, we investigate the generalization of models and data in Arabic with application to sentiment analysis, by performing a battery of experiments and building different models that are tested on five independent test sets to understand their performance when presented with unseen data. In doing so, we detail different techniques that improve the generalization of machine learning models in Arabic sentiment analysis, and share a large versatile dataset consisting of approximately 1.64M Arabic tweets and their corresponding sentiment to be used for future research. Our experiments concluded that the most consistent model is trained using a dataset labelled by a cascaded approach of two models, one that labels neutral tweets and another that identifies positive/negative tweets based on the Arabic emoji lexicon after class balancing. Both the BERT and the SVM models trained using the refined data achieve an average F-1 score of 0.62 and 0.60, and standard deviation of 0.06 and 0.04 respectively, when evaluated on five diverse test sets, outperforming other models by at least 17% relative gain in F-1. Based on our experiments, we share recommendations to improve model generalization for classification tasks.","PeriodicalId":175641,"journal":{"name":"International Conference on Web and Social Media","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Towards Generalization of Machine Learning Models: A Case Study of Arabic Sentiment Analysis\",\"authors\":\"Samir Abdaljalil, S. Hassanein, Hamdy Mubarak, Ahmed Abdelali\",\"doi\":\"10.1609/icwsm.v17i1.22204\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The abundance of social media data in the Arab world, specifically on Twitter, enabled companies and entities to exploit such rich and beneficial data that could be mined and used to extract important information, including sentiments and opinions of people towards a topic or a merchandise. However, with this plenitude comes the issue of producing models that are able to deliver consistent outcomes when tested within various contexts. Although model generalization has been thoroughly investigated in many fields, it has not been heavily investigated in the Arabic context. To address this gap, we investigate the generalization of models and data in Arabic with application to sentiment analysis, by performing a battery of experiments and building different models that are tested on five independent test sets to understand their performance when presented with unseen data. In doing so, we detail different techniques that improve the generalization of machine learning models in Arabic sentiment analysis, and share a large versatile dataset consisting of approximately 1.64M Arabic tweets and their corresponding sentiment to be used for future research. Our experiments concluded that the most consistent model is trained using a dataset labelled by a cascaded approach of two models, one that labels neutral tweets and another that identifies positive/negative tweets based on the Arabic emoji lexicon after class balancing. Both the BERT and the SVM models trained using the refined data achieve an average F-1 score of 0.62 and 0.60, and standard deviation of 0.06 and 0.04 respectively, when evaluated on five diverse test sets, outperforming other models by at least 17% relative gain in F-1. Based on our experiments, we share recommendations to improve model generalization for classification tasks.\",\"PeriodicalId\":175641,\"journal\":{\"name\":\"International Conference on Web and Social Media\",\"volume\":\"39 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Conference on Web and Social Media\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1609/icwsm.v17i1.22204\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Web and Social Media","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1609/icwsm.v17i1.22204","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

阿拉伯世界丰富的社交媒体数据，特别是Twitter上的数据，使公司和实体能够利用这些丰富而有益的数据，这些数据可以被挖掘并用于提取重要信息，包括人们对某个话题或商品的情绪和观点。然而，随着这种丰富性的出现，产生能够在各种环境中测试时交付一致结果的模型的问题出现了。虽然模型泛化已经在许多领域进行了深入的研究，但在阿拉伯语背景下还没有进行大量的研究。为了解决这一差距，我们研究了阿拉伯语模型和数据的泛化，并将其应用于情感分析，通过执行一系列实验并构建不同的模型，这些模型在五个独立的测试集上进行测试，以了解它们在呈现未知数据时的表现。在此过程中，我们详细介绍了在阿拉伯语情感分析中提高机器学习模型泛化的不同技术，并共享了一个由大约1.64M条阿拉伯语推文及其相应情绪组成的大型通用数据集，用于未来的研究。我们的实验得出结论，最一致的模型是使用两个模型的级联方法标记的数据集来训练的，一个模型标记中性推文，另一个模型在类平衡后基于阿拉伯表情符号词典识别积极/消极推文。使用改进数据训练的BERT和SVM模型在五个不同的测试集上进行评估时，平均F-1得分分别为0.62和0.60，标准差分别为0.06和0.04，在F-1方面至少比其他模型高出17%。基于我们的实验，我们分享了改进分类任务的模型泛化的建议。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Towards Generalization of Machine Learning Models: A Case Study of Arabic Sentiment Analysis

The abundance of social media data in the Arab world, specifically on Twitter, enabled companies and entities to exploit such rich and beneficial data that could be mined and used to extract important information, including sentiments and opinions of people towards a topic or a merchandise. However, with this plenitude comes the issue of producing models that are able to deliver consistent outcomes when tested within various contexts. Although model generalization has been thoroughly investigated in many fields, it has not been heavily investigated in the Arabic context. To address this gap, we investigate the generalization of models and data in Arabic with application to sentiment analysis, by performing a battery of experiments and building different models that are tested on five independent test sets to understand their performance when presented with unseen data. In doing so, we detail different techniques that improve the generalization of machine learning models in Arabic sentiment analysis, and share a large versatile dataset consisting of approximately 1.64M Arabic tweets and their corresponding sentiment to be used for future research. Our experiments concluded that the most consistent model is trained using a dataset labelled by a cascaded approach of two models, one that labels neutral tweets and another that identifies positive/negative tweets based on the Arabic emoji lexicon after class balancing. Both the BERT and the SVM models trained using the refined data achieve an average F-1 score of 0.62 and 0.60, and standard deviation of 0.06 and 0.04 respectively, when evaluated on five diverse test sets, outperforming other models by at least 17% relative gain in F-1. Based on our experiments, we share recommendations to improve model generalization for classification tasks.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Conference on Web and Social Media

自引率

0.00%

发文量