Cross-domain Sentiment Classification in Spanish

Lautaro Estienne, Matías Vera, L. Vega
{"title":"Cross-domain Sentiment Classification in Spanish","authors":"Lautaro Estienne, Matías Vera, L. Vega","doi":"10.1109/ARGENCON55245.2022.9940056","DOIUrl":null,"url":null,"abstract":"Sentiment Classification is a fundamental task in the field of Natural Language Processing, and has very important academic and commercial applications. It aims to automatically predict the degree of sentiment present in a text that contains opinions and subjectivity at some level, like product and movie reviews, or tweets. This can be really difficult to accomplish, in part, because different domains of text contains different words and expressions. In addition, this difficulty increases when text is written in a non-English language due to the lack of databases and resources. As a consequence, several cross-domain and cross-language techniques are often applied to this task in order to improve the results. In this work we perform a study on the ability of a classification system trained with a large database of product reviews to generalize to different Spanish domains. Reviews were collected from the MercadoLibre website from seven Latin American countries, allowing the creation of a large and balanced dataset. Results suggest that generalization across domains is feasible though very challenging when trained with these product reviews, and can be improved by pre-training and fine-tuning the classification model.","PeriodicalId":318846,"journal":{"name":"2022 IEEE Biennial Congress of Argentina (ARGENCON)","volume":"88 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE Biennial Congress of Argentina (ARGENCON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ARGENCON55245.2022.9940056","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Sentiment Classification is a fundamental task in the field of Natural Language Processing, and has very important academic and commercial applications. It aims to automatically predict the degree of sentiment present in a text that contains opinions and subjectivity at some level, like product and movie reviews, or tweets. This can be really difficult to accomplish, in part, because different domains of text contains different words and expressions. In addition, this difficulty increases when text is written in a non-English language due to the lack of databases and resources. As a consequence, several cross-domain and cross-language techniques are often applied to this task in order to improve the results. In this work we perform a study on the ability of a classification system trained with a large database of product reviews to generalize to different Spanish domains. Reviews were collected from the MercadoLibre website from seven Latin American countries, allowing the creation of a large and balanced dataset. Results suggest that generalization across domains is feasible though very challenging when trained with these product reviews, and can be improved by pre-training and fine-tuning the classification model.
西班牙语跨域情感分类
情感分类是自然语言处理领域的一项基础任务,具有重要的学术和商业应用。它旨在自动预测在某种程度上包含观点和主观性的文本中存在的情绪程度,比如产品和电影评论,或者推文。这很难做到,部分原因是不同的文本领域包含不同的单词和表达式。此外,由于缺乏数据库和资源,当文本以非英语语言编写时,这种困难会增加。因此,为了改进结果,经常将一些跨领域和跨语言的技术应用于此任务。在这项工作中,我们对分类系统的能力进行了研究,该分类系统使用大型产品评论数据库进行训练,以推广到不同的西班牙语领域。从“自由市场”网站上收集了来自七个拉丁美洲国家的评论,从而创建了一个庞大而平衡的数据集。结果表明,尽管使用这些产品评论进行训练非常具有挑战性,但跨域泛化是可行的,并且可以通过预训练和微调分类模型来改进。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信