DTDA:用于比较意见五元提取的双通道三元对五元数据增强技术

IF 7.2 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Qingting Xu , Kaisong Song , Yangyang Kang , Chaoqun Liu , Yu Hong , Guodong Zhou
{"title":"DTDA:用于比较意见五元提取的双通道三元对五元数据增强技术","authors":"Qingting Xu ,&nbsp;Kaisong Song ,&nbsp;Yangyang Kang ,&nbsp;Chaoqun Liu ,&nbsp;Yu Hong ,&nbsp;Guodong Zhou","doi":"10.1016/j.knosys.2024.112734","DOIUrl":null,"url":null,"abstract":"<div><div>Comparative Opinion Quintuple Extraction (COQE) is an essential task in sentiment analysis that entails the extraction of quintuples from comparative sentences. Each quintuple comprises a subject, an object, a shared aspect for comparison, a comparative opinion and a distinct preference. The prevalent reliance on extensively annotated datasets inherently constrains the efficiency of training. Manual data labeling is both time-consuming and labor-intensive, especially labeling quintuple data. Herein, we propose a <strong>D</strong>ual-channel <strong>T</strong>riple-to-quintuple <strong>D</strong>ata <strong>A</strong>ugmentation (<strong>DTDA</strong>) approach for the COQE task. In particular, we leverage ChatGPT to generate domain-specific triple data. Subsequently, we utilize these generated data and existing Aspect Sentiment Triplet Extraction (ASTE) data for separate preliminary fine-tuning. On this basis, we employ the two fine-tuned triple models for warm-up and construct a dual-channel quintuple model using the unabridged quintuples. We evaluate our approach on three benchmark datasets: Camera-COQE, Car-COQE and Ele-COQE. Our approach exhibits substantial improvements versus pipeline-based, joint, and T5-based baselines. Notably, the DTDA method significantly outperforms the best pipeline method, with exact match <span><math><mi>F</mi></math></span>1-score increasing by 10.32%, 8.97%, and 10.65% on Camera-COQE, Car-COQE and Ele-COQE, respectively. More importantly, our data augmentation method can adapt to any baselines. When integrated with the current SOTA UniCOQE method, it further improves performance by 0.34%, 1.65%, and 2.22%, respectively. We will make all related models and source code publicly available upon acceptance.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"307 ","pages":"Article 112734"},"PeriodicalIF":7.2000,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DTDA: Dual-channel Triple-to-quintuple Data Augmentation for Comparative Opinion Quintuple Extraction\",\"authors\":\"Qingting Xu ,&nbsp;Kaisong Song ,&nbsp;Yangyang Kang ,&nbsp;Chaoqun Liu ,&nbsp;Yu Hong ,&nbsp;Guodong Zhou\",\"doi\":\"10.1016/j.knosys.2024.112734\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Comparative Opinion Quintuple Extraction (COQE) is an essential task in sentiment analysis that entails the extraction of quintuples from comparative sentences. Each quintuple comprises a subject, an object, a shared aspect for comparison, a comparative opinion and a distinct preference. The prevalent reliance on extensively annotated datasets inherently constrains the efficiency of training. Manual data labeling is both time-consuming and labor-intensive, especially labeling quintuple data. Herein, we propose a <strong>D</strong>ual-channel <strong>T</strong>riple-to-quintuple <strong>D</strong>ata <strong>A</strong>ugmentation (<strong>DTDA</strong>) approach for the COQE task. In particular, we leverage ChatGPT to generate domain-specific triple data. Subsequently, we utilize these generated data and existing Aspect Sentiment Triplet Extraction (ASTE) data for separate preliminary fine-tuning. On this basis, we employ the two fine-tuned triple models for warm-up and construct a dual-channel quintuple model using the unabridged quintuples. We evaluate our approach on three benchmark datasets: Camera-COQE, Car-COQE and Ele-COQE. Our approach exhibits substantial improvements versus pipeline-based, joint, and T5-based baselines. Notably, the DTDA method significantly outperforms the best pipeline method, with exact match <span><math><mi>F</mi></math></span>1-score increasing by 10.32%, 8.97%, and 10.65% on Camera-COQE, Car-COQE and Ele-COQE, respectively. More importantly, our data augmentation method can adapt to any baselines. When integrated with the current SOTA UniCOQE method, it further improves performance by 0.34%, 1.65%, and 2.22%, respectively. We will make all related models and source code publicly available upon acceptance.</div></div>\",\"PeriodicalId\":49939,\"journal\":{\"name\":\"Knowledge-Based Systems\",\"volume\":\"307 \",\"pages\":\"Article 112734\"},\"PeriodicalIF\":7.2000,\"publicationDate\":\"2024-11-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Knowledge-Based Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0950705124013686\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705124013686","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

比较意见五元组提取(COQE)是情感分析中的一项重要任务,需要从比较句中提取五元组。每个五元组都包括一个主语、一个宾语、一个用于比较的共同方面、一个比较意见和一个不同的偏好。普遍依赖广泛注释的数据集从本质上限制了训练的效率。人工标注数据既耗时又耗力,尤其是标注五元数据。在此,我们针对 COQE 任务提出了一种双通道三重到五重数据增强(DTDA)方法。特别是,我们利用 ChatGPT 生成特定领域的三倍数据。随后,我们利用这些生成的数据和现有的方面情感三重提取(ASTE)数据分别进行初步微调。在此基础上,我们使用两个微调后的三元组模型进行热身,并使用未删节的五元组构建双通道五元组模型。我们在三个基准数据集上评估了我们的方法:Camera-COQE、Car-COQE 和 Ele-COQE。与基于流水线的方法、联合方法和基于 T5 的基线方法相比,我们的方法有了很大的改进。值得注意的是,DTDA 方法明显优于最佳流水线方法,在 Camera-COQE、Car-COQE 和 Ele-COQE 上,精确匹配的 F1 分数分别提高了 10.32%、8.97% 和 10.65%。更重要的是,我们的数据增强方法可以适应任何基线。当与当前的 SOTA UniCOQE 方法集成时,其性能分别进一步提高了 0.34%、1.65% 和 2.22%。我们将在获得认可后公开所有相关模型和源代码。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
DTDA: Dual-channel Triple-to-quintuple Data Augmentation for Comparative Opinion Quintuple Extraction
Comparative Opinion Quintuple Extraction (COQE) is an essential task in sentiment analysis that entails the extraction of quintuples from comparative sentences. Each quintuple comprises a subject, an object, a shared aspect for comparison, a comparative opinion and a distinct preference. The prevalent reliance on extensively annotated datasets inherently constrains the efficiency of training. Manual data labeling is both time-consuming and labor-intensive, especially labeling quintuple data. Herein, we propose a Dual-channel Triple-to-quintuple Data Augmentation (DTDA) approach for the COQE task. In particular, we leverage ChatGPT to generate domain-specific triple data. Subsequently, we utilize these generated data and existing Aspect Sentiment Triplet Extraction (ASTE) data for separate preliminary fine-tuning. On this basis, we employ the two fine-tuned triple models for warm-up and construct a dual-channel quintuple model using the unabridged quintuples. We evaluate our approach on three benchmark datasets: Camera-COQE, Car-COQE and Ele-COQE. Our approach exhibits substantial improvements versus pipeline-based, joint, and T5-based baselines. Notably, the DTDA method significantly outperforms the best pipeline method, with exact match F1-score increasing by 10.32%, 8.97%, and 10.65% on Camera-COQE, Car-COQE and Ele-COQE, respectively. More importantly, our data augmentation method can adapt to any baselines. When integrated with the current SOTA UniCOQE method, it further improves performance by 0.34%, 1.65%, and 2.22%, respectively. We will make all related models and source code publicly available upon acceptance.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Knowledge-Based Systems
Knowledge-Based Systems 工程技术-计算机:人工智能
CiteScore
14.80
自引率
12.50%
发文量
1245
审稿时长
7.8 months
期刊介绍: Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信