DTDA：用于比较意见五元提取的双通道三元对五元数据增强技术

IF 7.2 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Knowledge-Based Systems Pub Date : 2024-11-20 DOI:10.1016/j.knosys.2024.112734

Qingting Xu , Kaisong Song , Yangyang Kang , Chaoqun Liu , Yu Hong , Guodong Zhou

{"title":"DTDA：用于比较意见五元提取的双通道三元对五元数据增强技术","authors":"Qingting Xu , Kaisong Song , Yangyang Kang , Chaoqun Liu , Yu Hong , Guodong Zhou","doi":"10.1016/j.knosys.2024.112734","DOIUrl":null,"url":null,"abstract":"<div><div>Comparative Opinion Quintuple Extraction (COQE) is an essential task in sentiment analysis that entails the extraction of quintuples from comparative sentences. Each quintuple comprises a subject, an object, a shared aspect for comparison, a comparative opinion and a distinct preference. The prevalent reliance on extensively annotated datasets inherently constrains the efficiency of training. Manual data labeling is both time-consuming and labor-intensive, especially labeling quintuple data. Herein, we propose a <strong>D</strong>ual-channel <strong>T</strong>riple-to-quintuple <strong>D</strong>ata <strong>A</strong>ugmentation (<strong>DTDA</strong>) approach for the COQE task. In particular, we leverage ChatGPT to generate domain-specific triple data. Subsequently, we utilize these generated data and existing Aspect Sentiment Triplet Extraction (ASTE) data for separate preliminary fine-tuning. On this basis, we employ the two fine-tuned triple models for warm-up and construct a dual-channel quintuple model using the unabridged quintuples. We evaluate our approach on three benchmark datasets: Camera-COQE, Car-COQE and Ele-COQE. Our approach exhibits substantial improvements versus pipeline-based, joint, and T5-based baselines. Notably, the DTDA method significantly outperforms the best pipeline method, with exact match <span><math><mi>F</mi></math></span>1-score increasing by 10.32%, 8.97%, and 10.65% on Camera-COQE, Car-COQE and Ele-COQE, respectively. More importantly, our data augmentation method can adapt to any baselines. When integrated with the current SOTA UniCOQE method, it further improves performance by 0.34%, 1.65%, and 2.22%, respectively. We will make all related models and source code publicly available upon acceptance.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"307 ","pages":"Article 112734"},"PeriodicalIF":7.2000,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DTDA: Dual-channel Triple-to-quintuple Data Augmentation for Comparative Opinion Quintuple Extraction\",\"authors\":\"Qingting Xu , Kaisong Song , Yangyang Kang , Chaoqun Liu , Yu Hong , Guodong Zhou\",\"doi\":\"10.1016/j.knosys.2024.112734\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Comparative Opinion Quintuple Extraction (COQE) is an essential task in sentiment analysis that entails the extraction of quintuples from comparative sentences. Each quintuple comprises a subject, an object, a shared aspect for comparison, a comparative opinion and a distinct preference. The prevalent reliance on extensively annotated datasets inherently constrains the efficiency of training. Manual data labeling is both time-consuming and labor-intensive, especially labeling quintuple data. Herein, we propose a <strong>D</strong>ual-channel <strong>T</strong>riple-to-quintuple <strong>D</strong>ata <strong>A</strong>ugmentation (<strong>DTDA</strong>) approach for the COQE task. In particular, we leverage ChatGPT to generate domain-specific triple data. Subsequently, we utilize these generated data and existing Aspect Sentiment Triplet Extraction (ASTE) data for separate preliminary fine-tuning. On this basis, we employ the two fine-tuned triple models for warm-up and construct a dual-channel quintuple model using the unabridged quintuples. We evaluate our approach on three benchmark datasets: Camera-COQE, Car-COQE and Ele-COQE. Our approach exhibits substantial improvements versus pipeline-based, joint, and T5-based baselines. Notably, the DTDA method significantly outperforms the best pipeline method, with exact match <span><math><mi>F</mi></math></span>1-score increasing by 10.32%, 8.97%, and 10.65% on Camera-COQE, Car-COQE and Ele-COQE, respectively. More importantly, our data augmentation method can adapt to any baselines. When integrated with the current SOTA UniCOQE method, it further improves performance by 0.34%, 1.65%, and 2.22%, respectively. We will make all related models and source code publicly available upon acceptance.</div></div>\",\"PeriodicalId\":49939,\"journal\":{\"name\":\"Knowledge-Based Systems\",\"volume\":\"307 \",\"pages\":\"Article 112734\"},\"PeriodicalIF\":7.2000,\"publicationDate\":\"2024-11-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Knowledge-Based Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0950705124013686\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705124013686","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

比较意见五元组提取（COQE）是情感分析中的一项重要任务，需要从比较句中提取五元组。每个五元组都包括一个主语、一个宾语、一个用于比较的共同方面、一个比较意见和一个不同的偏好。普遍依赖广泛注释的数据集从本质上限制了训练的效率。人工标注数据既耗时又耗力，尤其是标注五元数据。在此，我们针对 COQE 任务提出了一种双通道三重到五重数据增强（DTDA）方法。特别是，我们利用 ChatGPT 生成特定领域的三倍数据。随后，我们利用这些生成的数据和现有的方面情感三重提取（ASTE）数据分别进行初步微调。在此基础上，我们使用两个微调后的三元组模型进行热身，并使用未删节的五元组构建双通道五元组模型。我们在三个基准数据集上评估了我们的方法：Camera-COQE、Car-COQE 和 Ele-COQE。与基于流水线的方法、联合方法和基于 T5 的基线方法相比，我们的方法有了很大的改进。值得注意的是，DTDA 方法明显优于最佳流水线方法，在 Camera-COQE、Car-COQE 和 Ele-COQE 上，精确匹配的 F1 分数分别提高了 10.32%、8.97% 和 10.65%。更重要的是，我们的数据增强方法可以适应任何基线。当与当前的 SOTA UniCOQE 方法集成时，其性能分别进一步提高了 0.34%、1.65% 和 2.22%。我们将在获得认可后公开所有相关模型和源代码。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

DTDA: Dual-channel Triple-to-quintuple Data Augmentation for Comparative Opinion Quintuple Extraction

Comparative Opinion Quintuple Extraction (COQE) is an essential task in sentiment analysis that entails the extraction of quintuples from comparative sentences. Each quintuple comprises a subject, an object, a shared aspect for comparison, a comparative opinion and a distinct preference. The prevalent reliance on extensively annotated datasets inherently constrains the efficiency of training. Manual data labeling is both time-consuming and labor-intensive, especially labeling quintuple data. Herein, we propose a Dual-channel Triple-to-quintuple Data Augmentation (DTDA) approach for the COQE task. In particular, we leverage ChatGPT to generate domain-specific triple data. Subsequently, we utilize these generated data and existing Aspect Sentiment Triplet Extraction (ASTE) data for separate preliminary fine-tuning. On this basis, we employ the two fine-tuned triple models for warm-up and construct a dual-channel quintuple model using the unabridged quintuples. We evaluate our approach on three benchmark datasets: Camera-COQE, Car-COQE and Ele-COQE. Our approach exhibits substantial improvements versus pipeline-based, joint, and T5-based baselines. Notably, the DTDA method significantly outperforms the best pipeline method, with exact match

F

1-score increasing by 10.32%, 8.97%, and 10.65% on Camera-COQE, Car-COQE and Ele-COQE, respectively. More importantly, our data augmentation method can adapt to any baselines. When integrated with the current SOTA UniCOQE method, it further improves performance by 0.34%, 1.65%, and 2.22%, respectively. We will make all related models and source code publicly available upon acceptance.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Knowledge-Based Systems 工程技术-计算机：人工智能

CiteScore

14.80

自引率

12.50%

发文量

1245

审稿时长

7.8 months

期刊介绍： Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.