MuSe-CarASTE：用于汽车评论视频中方面情感三元组提取的综合数据集

IF 7.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Expert Systems with Applications Pub Date : 2024-11-06 DOI:10.1016/j.eswa.2024.125695

Atiya Usmani , Saeed Hamood Alsamhi , Muhammad Jaleed Khan , John Breslin , Edward Curry

{"title":"MuSe-CarASTE：用于汽车评论视频中方面情感三元组提取的综合数据集","authors":"Atiya Usmani , Saeed Hamood Alsamhi , Muhammad Jaleed Khan , John Breslin , Edward Curry","doi":"10.1016/j.eswa.2024.125695","DOIUrl":null,"url":null,"abstract":"<div><div>In the Aspect-Based Sentiment Analysis (ABSA) domain, the Aspect Sentiment Triplet Extraction (ASTE) task has emerged as a pivotal endeavor, offering insights into nuanced aspects, opinions, and sentiment relationships. This paper introduces “MuSe-CarASTE”, an extensive and meticulously curated dataset purpose-built to propel ASTE advancements within the automotive domain. The core emphasis of MuSe-CarASTE is on aspect, opinion, and sentiment triplets, facilitating a comprehensive analysis of product reviews. Comprising transcripts from MuSe-Car’s automotive video reviews, MuSe-CarASTE presents a sub-stantial collection of nearly 28,295 sentences organized into 5,500 segments. Each segment is meticulously annotated with multiple aspects, opinions, and sentiment labels, offering unprecedented granularity for ASTE tasks. The percentage agreement between annotated triples by different annotators over the randomly sampled subset of the dataset is 79.74 %, at similarity threshold <em>τ</em> = 0.60. We also experimented with four baseline models on our datset and report results. The distinctiveness of the dataset emerges from its extension into the automotive domain, shedding light on sentiment dynamics specific to vehicles. With the fusion of extensive content and real-world applicability, MuSe-CarASTE presents a fertile ground for Natural Language Processing (NLP) innovation. Researchers, practitioners, and data scientists can harness MuSe-CarASTE to build and evaluate NLP models tailored for challenges in ASTE. These challenges encompass intricate aspect-opinion relationships, multi-word aspect and opinion extraction, and the subtleties of vague language. Moreover, including aspects not verbatim in sentences introduces a practical dimension to our dataset, enabling real-world applications like review pattern analysis, summarization, and recommender system enhancement. As a pioneering benchmark for NLP model evaluation in ABSA, MuSe-CarASTE integrates content richness, real-world context, and sentiment complexity. The integration empowers the development of accurate, adaptable, and insightful sentiment analysis models within the automotive review landscape.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"262 ","pages":"Article 125695"},"PeriodicalIF":7.5000,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MuSe-CarASTE: A comprehensive dataset for aspect sentiment triplet extraction in automotive review videos\",\"authors\":\"Atiya Usmani , Saeed Hamood Alsamhi , Muhammad Jaleed Khan , John Breslin , Edward Curry\",\"doi\":\"10.1016/j.eswa.2024.125695\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>In the Aspect-Based Sentiment Analysis (ABSA) domain, the Aspect Sentiment Triplet Extraction (ASTE) task has emerged as a pivotal endeavor, offering insights into nuanced aspects, opinions, and sentiment relationships. This paper introduces “MuSe-CarASTE”, an extensive and meticulously curated dataset purpose-built to propel ASTE advancements within the automotive domain. The core emphasis of MuSe-CarASTE is on aspect, opinion, and sentiment triplets, facilitating a comprehensive analysis of product reviews. Comprising transcripts from MuSe-Car’s automotive video reviews, MuSe-CarASTE presents a sub-stantial collection of nearly 28,295 sentences organized into 5,500 segments. Each segment is meticulously annotated with multiple aspects, opinions, and sentiment labels, offering unprecedented granularity for ASTE tasks. The percentage agreement between annotated triples by different annotators over the randomly sampled subset of the dataset is 79.74 %, at similarity threshold <em>τ</em> = 0.60. We also experimented with four baseline models on our datset and report results. The distinctiveness of the dataset emerges from its extension into the automotive domain, shedding light on sentiment dynamics specific to vehicles. With the fusion of extensive content and real-world applicability, MuSe-CarASTE presents a fertile ground for Natural Language Processing (NLP) innovation. Researchers, practitioners, and data scientists can harness MuSe-CarASTE to build and evaluate NLP models tailored for challenges in ASTE. These challenges encompass intricate aspect-opinion relationships, multi-word aspect and opinion extraction, and the subtleties of vague language. Moreover, including aspects not verbatim in sentences introduces a practical dimension to our dataset, enabling real-world applications like review pattern analysis, summarization, and recommender system enhancement. As a pioneering benchmark for NLP model evaluation in ABSA, MuSe-CarASTE integrates content richness, real-world context, and sentiment complexity. The integration empowers the development of accurate, adaptable, and insightful sentiment analysis models within the automotive review landscape.</div></div>\",\"PeriodicalId\":50461,\"journal\":{\"name\":\"Expert Systems with Applications\",\"volume\":\"262 \",\"pages\":\"Article 125695\"},\"PeriodicalIF\":7.5000,\"publicationDate\":\"2024-11-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Expert Systems with Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0957417424025624\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417424025624","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

在基于方面的情感分析（ABSA）领域，方面情感三重抽取（ASTE）任务已成为一项至关重要的工作，它提供了对细微方面、观点和情感关系的洞察力。本文介绍了 "MuSe-CarASTE"，这是一个广泛而精心策划的数据集，旨在推动汽车领域 ASTE 的发展。MuSe-CarASTE 的核心重点是方面、观点和情感三元组，有助于对产品评论进行全面分析。MuSe-CarASTE 由 MuSe-Car 的汽车视频评论记录组成，提供了一个由近 28,295 个句子组成的子集，分为 5,500 个片段。每个片段都精心标注了多个方面、观点和情感标签，为 ASTE 任务提供了前所未有的粒度。在相似性阈值 τ = 0.60 的条件下，不同注释者在随机抽样的数据集子集上注释的三元组之间的一致率为 79.74%。我们还在数据集上试验了四种基线模型，并报告了结果。该数据集的独特之处在于它扩展到了汽车领域，揭示了汽车特有的情感动态。MuSe-CarASTE 融合了广泛的内容和现实世界的适用性，为自然语言处理 (NLP) 的创新提供了肥沃的土壤。研究人员、从业人员和数据科学家可以利用 MuSe-CarASTE 来构建和评估针对 ASTE 中的挑战而定制的 NLP 模型。这些挑战包括错综复杂的方面-观点关系、多词方面和观点提取以及模糊语言的微妙之处。此外，将句子中的非逐字方面纳入数据集还为我们的数据集引入了一个实用维度，使评论模式分析、总结和推荐系统增强等现实世界的应用成为可能。作为 ABSA 中 NLP 模型评估的先驱基准，MuSe-CarASTE 整合了内容丰富性、真实语境和情感复杂性。这种整合有助于在汽车评论领域开发准确、适应性强且具有洞察力的情感分析模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

MuSe-CarASTE: A comprehensive dataset for aspect sentiment triplet extraction in automotive review videos

In the Aspect-Based Sentiment Analysis (ABSA) domain, the Aspect Sentiment Triplet Extraction (ASTE) task has emerged as a pivotal endeavor, offering insights into nuanced aspects, opinions, and sentiment relationships. This paper introduces “MuSe-CarASTE”, an extensive and meticulously curated dataset purpose-built to propel ASTE advancements within the automotive domain. The core emphasis of MuSe-CarASTE is on aspect, opinion, and sentiment triplets, facilitating a comprehensive analysis of product reviews. Comprising transcripts from MuSe-Car’s automotive video reviews, MuSe-CarASTE presents a sub-stantial collection of nearly 28,295 sentences organized into 5,500 segments. Each segment is meticulously annotated with multiple aspects, opinions, and sentiment labels, offering unprecedented granularity for ASTE tasks. The percentage agreement between annotated triples by different annotators over the randomly sampled subset of the dataset is 79.74 %, at similarity threshold τ = 0.60. We also experimented with four baseline models on our datset and report results. The distinctiveness of the dataset emerges from its extension into the automotive domain, shedding light on sentiment dynamics specific to vehicles. With the fusion of extensive content and real-world applicability, MuSe-CarASTE presents a fertile ground for Natural Language Processing (NLP) innovation. Researchers, practitioners, and data scientists can harness MuSe-CarASTE to build and evaluate NLP models tailored for challenges in ASTE. These challenges encompass intricate aspect-opinion relationships, multi-word aspect and opinion extraction, and the subtleties of vague language. Moreover, including aspects not verbatim in sentences introduces a practical dimension to our dataset, enabling real-world applications like review pattern analysis, summarization, and recommender system enhancement. As a pioneering benchmark for NLP model evaluation in ABSA, MuSe-CarASTE integrates content richness, real-world context, and sentiment complexity. The integration empowers the development of accurate, adaptable, and insightful sentiment analysis models within the automotive review landscape.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Expert Systems with Applications 工程技术-工程：电子与电气

CiteScore

13.80

自引率

10.60%

发文量

2045

审稿时长

8.7 months

期刊介绍： Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.