ViTASA: New benchmark and methods for Vietnamese targeted aspect sentiment analysis for multiple textual domains

IF 3.1 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computer Speech and Language Pub Date : 2025-03-27 DOI:10.1016/j.csl.2025.101800

Khanh Quoc Tran, Quang Phan-Minh Huynh, Oanh Thi-Hong Le, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen

{"title":"ViTASA: New benchmark and methods for Vietnamese targeted aspect sentiment analysis for multiple textual domains","authors":"Khanh Quoc Tran, Quang Phan-Minh Huynh, Oanh Thi-Hong Le, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen","doi":"10.1016/j.csl.2025.101800","DOIUrl":null,"url":null,"abstract":"<div><div>Targeted Aspect Sentiment Analysis (TASA) has gained substantial attraction in recent years, fostering diverse studies and technological advancements. However, the development of TASA resources for Vietnamese has been limited. This paper introduces ViTASA, a comprehensive, high-quality dataset designed to catalyze advancements in Vietnamese TASA. ViTASA encompasses over 500,000 target-aspect pairs from social media comments across three key domains: mobile, restaurant, and hotel, thereby addressing critical gaps in existing datasets. Additionally, ViTASA integrates a novel multi-task evaluation framework, posing new challenges and enabling robust model assessments. We present ViTASD, an innovative BERT-based approach optimized for the linguistic features of Vietnamese. Comparative analyses demonstrate that ViTASD significantly outperforms existing state-of-the-art methods, including CG-BERT, QACG-BERT, BERT-pair-QA, BERT-pair-NLI, and a range of zero-shot learning models like Gemma, Llama, Mistral and Qwen. Notably, ViTASD achieves superior macro F1-scores of 61.77%, 41.12%, and 52.64% in the mobile, restaurant, and hotel domains respectively. This study not only highlights the challenges inherent in Vietnamese sentiment analysis but also lays a robust foundation for future research endeavors in this area. In a commitment to advancing TASA technology and enhancing the reliability of digital media analyses, we have made the ViTASA dataset, model checkpoints, and source code openly accessible on GitHub<span><span><sup>1</sup></span></span>.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"93 ","pages":"Article 101800"},"PeriodicalIF":3.1000,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Speech and Language","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0885230825000257","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Targeted Aspect Sentiment Analysis (TASA) has gained substantial attraction in recent years, fostering diverse studies and technological advancements. However, the development of TASA resources for Vietnamese has been limited. This paper introduces ViTASA, a comprehensive, high-quality dataset designed to catalyze advancements in Vietnamese TASA. ViTASA encompasses over 500,000 target-aspect pairs from social media comments across three key domains: mobile, restaurant, and hotel, thereby addressing critical gaps in existing datasets. Additionally, ViTASA integrates a novel multi-task evaluation framework, posing new challenges and enabling robust model assessments. We present ViTASD, an innovative BERT-based approach optimized for the linguistic features of Vietnamese. Comparative analyses demonstrate that ViTASD significantly outperforms existing state-of-the-art methods, including CG-BERT, QACG-BERT, BERT-pair-QA, BERT-pair-NLI, and a range of zero-shot learning models like Gemma, Llama, Mistral and Qwen. Notably, ViTASD achieves superior macro F1-scores of 61.77%, 41.12%, and 52.64% in the mobile, restaurant, and hotel domains respectively. This study not only highlights the challenges inherent in Vietnamese sentiment analysis but also lays a robust foundation for future research endeavors in this area. In a commitment to advancing TASA technology and enhancing the reliability of digital media analyses, we have made the ViTASA dataset, model checkpoints, and source code openly accessible on GitHub¹.

查看原文本刊更多论文

ViTASA：针对多个文本领域的越南语目标方面情感分析的新基准和方法

目标方面情感分析（TASA）近年来获得了巨大的吸引力，促进了多样化的研究和技术进步。然而，越南对TASA资源的开发一直受到限制。本文介绍了ViTASA，一个全面的，高质量的数据集，旨在促进越南TASA的进步。ViTASA涵盖了来自社交媒体评论的超过500,000个目标方面对，涉及三个关键领域：移动、餐厅和酒店，从而解决了现有数据集的关键空白。此外，ViTASA集成了一个新颖的多任务评估框架，提出了新的挑战，并实现了稳健的模型评估。我们提出了ViTASD，一种创新的基于bert的方法，针对越南语的语言特征进行了优化。对比分析表明，ViTASD显著优于现有的最先进的方法，包括CG-BERT、QACG-BERT、BERT-pair-QA、BERT-pair-NLI，以及一系列零射击学习模型，如Gemma、Llama、Mistral和Qwen。值得注意的是，ViTASD在移动、餐饮和酒店领域的宏观f1得分分别为61.77%、41.12%和52.64%。本研究不仅突出了越南情绪分析固有的挑战，而且为该领域未来的研究工作奠定了坚实的基础。为了推进TASA技术和提高数字媒体分析的可靠性，我们已经在GitHub1上开放了ViTASA数据集、模型检查点和源代码。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computer Speech and Language 工程技术-计算机：人工智能

CiteScore

11.30

自引率

4.70%

发文量

审稿时长

22.9 weeks

期刊介绍： Computer Speech & Language publishes reports of original research related to the recognition, understanding, production, coding and mining of speech and language. The speech and language sciences have a long history, but it is only relatively recently that large-scale implementation of and experimentation with complex models of speech and language processing has become feasible. Such research is often carried out somewhat separately by practitioners of artificial intelligence, computer science, electronic engineering, information retrieval, linguistics, phonetics, or psychology.