Qingting Xu , Kaisong Song , Yangyang Kang , Chaoqun Liu , Yu Hong , Guodong Zhou
{"title":"DTDA:用于比较意见五元提取的双通道三元对五元数据增强技术","authors":"Qingting Xu , Kaisong Song , Yangyang Kang , Chaoqun Liu , Yu Hong , Guodong Zhou","doi":"10.1016/j.knosys.2024.112734","DOIUrl":null,"url":null,"abstract":"<div><div>Comparative Opinion Quintuple Extraction (COQE) is an essential task in sentiment analysis that entails the extraction of quintuples from comparative sentences. Each quintuple comprises a subject, an object, a shared aspect for comparison, a comparative opinion and a distinct preference. The prevalent reliance on extensively annotated datasets inherently constrains the efficiency of training. Manual data labeling is both time-consuming and labor-intensive, especially labeling quintuple data. Herein, we propose a <strong>D</strong>ual-channel <strong>T</strong>riple-to-quintuple <strong>D</strong>ata <strong>A</strong>ugmentation (<strong>DTDA</strong>) approach for the COQE task. In particular, we leverage ChatGPT to generate domain-specific triple data. Subsequently, we utilize these generated data and existing Aspect Sentiment Triplet Extraction (ASTE) data for separate preliminary fine-tuning. On this basis, we employ the two fine-tuned triple models for warm-up and construct a dual-channel quintuple model using the unabridged quintuples. We evaluate our approach on three benchmark datasets: Camera-COQE, Car-COQE and Ele-COQE. Our approach exhibits substantial improvements versus pipeline-based, joint, and T5-based baselines. Notably, the DTDA method significantly outperforms the best pipeline method, with exact match <span><math><mi>F</mi></math></span>1-score increasing by 10.32%, 8.97%, and 10.65% on Camera-COQE, Car-COQE and Ele-COQE, respectively. More importantly, our data augmentation method can adapt to any baselines. When integrated with the current SOTA UniCOQE method, it further improves performance by 0.34%, 1.65%, and 2.22%, respectively. We will make all related models and source code publicly available upon acceptance.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"307 ","pages":"Article 112734"},"PeriodicalIF":7.2000,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DTDA: Dual-channel Triple-to-quintuple Data Augmentation for Comparative Opinion Quintuple Extraction\",\"authors\":\"Qingting Xu , Kaisong Song , Yangyang Kang , Chaoqun Liu , Yu Hong , Guodong Zhou\",\"doi\":\"10.1016/j.knosys.2024.112734\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Comparative Opinion Quintuple Extraction (COQE) is an essential task in sentiment analysis that entails the extraction of quintuples from comparative sentences. Each quintuple comprises a subject, an object, a shared aspect for comparison, a comparative opinion and a distinct preference. The prevalent reliance on extensively annotated datasets inherently constrains the efficiency of training. Manual data labeling is both time-consuming and labor-intensive, especially labeling quintuple data. Herein, we propose a <strong>D</strong>ual-channel <strong>T</strong>riple-to-quintuple <strong>D</strong>ata <strong>A</strong>ugmentation (<strong>DTDA</strong>) approach for the COQE task. In particular, we leverage ChatGPT to generate domain-specific triple data. Subsequently, we utilize these generated data and existing Aspect Sentiment Triplet Extraction (ASTE) data for separate preliminary fine-tuning. On this basis, we employ the two fine-tuned triple models for warm-up and construct a dual-channel quintuple model using the unabridged quintuples. We evaluate our approach on three benchmark datasets: Camera-COQE, Car-COQE and Ele-COQE. Our approach exhibits substantial improvements versus pipeline-based, joint, and T5-based baselines. Notably, the DTDA method significantly outperforms the best pipeline method, with exact match <span><math><mi>F</mi></math></span>1-score increasing by 10.32%, 8.97%, and 10.65% on Camera-COQE, Car-COQE and Ele-COQE, respectively. More importantly, our data augmentation method can adapt to any baselines. When integrated with the current SOTA UniCOQE method, it further improves performance by 0.34%, 1.65%, and 2.22%, respectively. We will make all related models and source code publicly available upon acceptance.</div></div>\",\"PeriodicalId\":49939,\"journal\":{\"name\":\"Knowledge-Based Systems\",\"volume\":\"307 \",\"pages\":\"Article 112734\"},\"PeriodicalIF\":7.2000,\"publicationDate\":\"2024-11-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Knowledge-Based Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0950705124013686\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705124013686","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
DTDA: Dual-channel Triple-to-quintuple Data Augmentation for Comparative Opinion Quintuple Extraction
Comparative Opinion Quintuple Extraction (COQE) is an essential task in sentiment analysis that entails the extraction of quintuples from comparative sentences. Each quintuple comprises a subject, an object, a shared aspect for comparison, a comparative opinion and a distinct preference. The prevalent reliance on extensively annotated datasets inherently constrains the efficiency of training. Manual data labeling is both time-consuming and labor-intensive, especially labeling quintuple data. Herein, we propose a Dual-channel Triple-to-quintuple Data Augmentation (DTDA) approach for the COQE task. In particular, we leverage ChatGPT to generate domain-specific triple data. Subsequently, we utilize these generated data and existing Aspect Sentiment Triplet Extraction (ASTE) data for separate preliminary fine-tuning. On this basis, we employ the two fine-tuned triple models for warm-up and construct a dual-channel quintuple model using the unabridged quintuples. We evaluate our approach on three benchmark datasets: Camera-COQE, Car-COQE and Ele-COQE. Our approach exhibits substantial improvements versus pipeline-based, joint, and T5-based baselines. Notably, the DTDA method significantly outperforms the best pipeline method, with exact match 1-score increasing by 10.32%, 8.97%, and 10.65% on Camera-COQE, Car-COQE and Ele-COQE, respectively. More importantly, our data augmentation method can adapt to any baselines. When integrated with the current SOTA UniCOQE method, it further improves performance by 0.34%, 1.65%, and 2.22%, respectively. We will make all related models and source code publicly available upon acceptance.
期刊介绍:
Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.