Span-level emotion-cause-category triplet extraction with instruction tuning LLMs and data augmentation

IF 6.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Xiangju Li , Dong Yang , Xiaogang Zhu , Faliang Huang , Peng Zhang , Zhongying Zhao
{"title":"Span-level emotion-cause-category triplet extraction with instruction tuning LLMs and data augmentation","authors":"Xiangju Li ,&nbsp;Dong Yang ,&nbsp;Xiaogang Zhu ,&nbsp;Faliang Huang ,&nbsp;Peng Zhang ,&nbsp;Zhongying Zhao","doi":"10.1016/j.asoc.2025.113938","DOIUrl":null,"url":null,"abstract":"<div><div>Span-level emotion-cause-category triplet extraction is a fine-grained task in emotion cause analysis that aims to identify emotion spans, cause spans, and their corresponding emotion categories from documents. Existing methods, including clause-level emotion-cause pair extraction and span-level emotion-cause detection, often suffer from redundant information and difficulties in accurately classifying emotion categories, particularly when emotions are expressed implicitly or ambiguously. To overcome these challenges, this study explores a fine-grained approach to span-level emotion-cause-category triplet extraction and introduces an innovative framework that leverages instruction tuning and data augmentation techniques based on large language models. The proposed method employs task-specific triplet extraction instructions and utilizes low-rank adaptation to fine-tune large language models, eliminating the necessity for intricate task-specific architectures. Furthermore, an LLM-based data augmentation strategy is developed to address data scarcity by guiding large language models in generating high-quality synthetic training data. Extensive experimental evaluations demonstrate that the proposed approach significantly outperforms existing baseline methods, achieving at least a 12.8 % improvement in span-level emotion-cause-category triplet extraction metrics. The results demonstrate the method’s effectiveness and robustness, offering a promising avenue for advancing research in emotion cause analysis.</div></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":"185 ","pages":"Article 113938"},"PeriodicalIF":6.6000,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Soft Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1568494625012517","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Span-level emotion-cause-category triplet extraction is a fine-grained task in emotion cause analysis that aims to identify emotion spans, cause spans, and their corresponding emotion categories from documents. Existing methods, including clause-level emotion-cause pair extraction and span-level emotion-cause detection, often suffer from redundant information and difficulties in accurately classifying emotion categories, particularly when emotions are expressed implicitly or ambiguously. To overcome these challenges, this study explores a fine-grained approach to span-level emotion-cause-category triplet extraction and introduces an innovative framework that leverages instruction tuning and data augmentation techniques based on large language models. The proposed method employs task-specific triplet extraction instructions and utilizes low-rank adaptation to fine-tune large language models, eliminating the necessity for intricate task-specific architectures. Furthermore, an LLM-based data augmentation strategy is developed to address data scarcity by guiding large language models in generating high-quality synthetic training data. Extensive experimental evaluations demonstrate that the proposed approach significantly outperforms existing baseline methods, achieving at least a 12.8 % improvement in span-level emotion-cause-category triplet extraction metrics. The results demonstrate the method’s effectiveness and robustness, offering a promising avenue for advancing research in emotion cause analysis.
基于指令调优llm和数据增强的跨层情感-原因-类别三元组提取
情感-原因-类别三元组抽取是情感原因分析中的一项细粒度任务,旨在从文档中识别情感范围、原因范围及其对应的情感类别。现有的方法,包括子句级情感原因对提取和跨级情感原因检测,往往存在信息冗余和难以准确分类情感类别的问题,特别是当情感表达含蓄或含糊时。为了克服这些挑战,本研究探索了一种跨级情感-原因-类别三元组提取的细粒度方法,并引入了一种利用基于大型语言模型的指令调优和数据增强技术的创新框架。该方法采用特定任务的三元组提取指令,并利用低秩自适应对大型语言模型进行微调,从而消除了复杂的特定任务架构的必要性。此外,开发了一种基于llm的数据增强策略,通过指导大型语言模型生成高质量的综合训练数据来解决数据稀缺性问题。广泛的实验评估表明,所提出的方法显著优于现有的基线方法,在跨层面的情感-原因-类别三重提取指标上至少提高了12.8%。结果证明了该方法的有效性和鲁棒性,为进一步研究情绪原因分析提供了一条有希望的途径。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Applied Soft Computing
Applied Soft Computing 工程技术-计算机:跨学科应用
CiteScore
15.80
自引率
6.90%
发文量
874
审稿时长
10.9 months
期刊介绍: Applied Soft Computing is an international journal promoting an integrated view of soft computing to solve real life problems.The focus is to publish the highest quality research in application and convergence of the areas of Fuzzy Logic, Neural Networks, Evolutionary Computing, Rough Sets and other similar techniques to address real world complexities. Applied Soft Computing is a rolling publication: articles are published as soon as the editor-in-chief has accepted them. Therefore, the web site will continuously be updated with new articles and the publication time will be short.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信