基于指令调优llm和数据增强的跨层情感-原因-类别三元组提取

IF 6.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Applied Soft Computing Pub Date : 2025-09-17 DOI:10.1016/j.asoc.2025.113938

Xiangju Li , Dong Yang , Xiaogang Zhu , Faliang Huang , Peng Zhang , Zhongying Zhao

{"title":"基于指令调优llm和数据增强的跨层情感-原因-类别三元组提取","authors":"Xiangju Li , Dong Yang , Xiaogang Zhu , Faliang Huang , Peng Zhang , Zhongying Zhao","doi":"10.1016/j.asoc.2025.113938","DOIUrl":null,"url":null,"abstract":"<div><div>Span-level emotion-cause-category triplet extraction is a fine-grained task in emotion cause analysis that aims to identify emotion spans, cause spans, and their corresponding emotion categories from documents. Existing methods, including clause-level emotion-cause pair extraction and span-level emotion-cause detection, often suffer from redundant information and difficulties in accurately classifying emotion categories, particularly when emotions are expressed implicitly or ambiguously. To overcome these challenges, this study explores a fine-grained approach to span-level emotion-cause-category triplet extraction and introduces an innovative framework that leverages instruction tuning and data augmentation techniques based on large language models. The proposed method employs task-specific triplet extraction instructions and utilizes low-rank adaptation to fine-tune large language models, eliminating the necessity for intricate task-specific architectures. Furthermore, an LLM-based data augmentation strategy is developed to address data scarcity by guiding large language models in generating high-quality synthetic training data. Extensive experimental evaluations demonstrate that the proposed approach significantly outperforms existing baseline methods, achieving at least a 12.8 % improvement in span-level emotion-cause-category triplet extraction metrics. The results demonstrate the method’s effectiveness and robustness, offering a promising avenue for advancing research in emotion cause analysis.</div></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":"185 ","pages":"Article 113938"},"PeriodicalIF":6.6000,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Span-level emotion-cause-category triplet extraction with instruction tuning LLMs and data augmentation\",\"authors\":\"Xiangju Li , Dong Yang , Xiaogang Zhu , Faliang Huang , Peng Zhang , Zhongying Zhao\",\"doi\":\"10.1016/j.asoc.2025.113938\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Span-level emotion-cause-category triplet extraction is a fine-grained task in emotion cause analysis that aims to identify emotion spans, cause spans, and their corresponding emotion categories from documents. Existing methods, including clause-level emotion-cause pair extraction and span-level emotion-cause detection, often suffer from redundant information and difficulties in accurately classifying emotion categories, particularly when emotions are expressed implicitly or ambiguously. To overcome these challenges, this study explores a fine-grained approach to span-level emotion-cause-category triplet extraction and introduces an innovative framework that leverages instruction tuning and data augmentation techniques based on large language models. The proposed method employs task-specific triplet extraction instructions and utilizes low-rank adaptation to fine-tune large language models, eliminating the necessity for intricate task-specific architectures. Furthermore, an LLM-based data augmentation strategy is developed to address data scarcity by guiding large language models in generating high-quality synthetic training data. Extensive experimental evaluations demonstrate that the proposed approach significantly outperforms existing baseline methods, achieving at least a 12.8 % improvement in span-level emotion-cause-category triplet extraction metrics. The results demonstrate the method’s effectiveness and robustness, offering a promising avenue for advancing research in emotion cause analysis.</div></div>\",\"PeriodicalId\":50737,\"journal\":{\"name\":\"Applied Soft Computing\",\"volume\":\"185 \",\"pages\":\"Article 113938\"},\"PeriodicalIF\":6.6000,\"publicationDate\":\"2025-09-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Soft Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1568494625012517\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Soft Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1568494625012517","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

情感-原因-类别三元组抽取是情感原因分析中的一项细粒度任务，旨在从文档中识别情感范围、原因范围及其对应的情感类别。现有的方法，包括子句级情感原因对提取和跨级情感原因检测，往往存在信息冗余和难以准确分类情感类别的问题，特别是当情感表达含蓄或含糊时。为了克服这些挑战，本研究探索了一种跨级情感-原因-类别三元组提取的细粒度方法，并引入了一种利用基于大型语言模型的指令调优和数据增强技术的创新框架。该方法采用特定任务的三元组提取指令，并利用低秩自适应对大型语言模型进行微调，从而消除了复杂的特定任务架构的必要性。此外，开发了一种基于llm的数据增强策略，通过指导大型语言模型生成高质量的综合训练数据来解决数据稀缺性问题。广泛的实验评估表明，所提出的方法显著优于现有的基线方法，在跨层面的情感-原因-类别三重提取指标上至少提高了12.8%。结果证明了该方法的有效性和鲁棒性，为进一步研究情绪原因分析提供了一条有希望的途径。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Span-level emotion-cause-category triplet extraction with instruction tuning LLMs and data augmentation

Span-level emotion-cause-category triplet extraction is a fine-grained task in emotion cause analysis that aims to identify emotion spans, cause spans, and their corresponding emotion categories from documents. Existing methods, including clause-level emotion-cause pair extraction and span-level emotion-cause detection, often suffer from redundant information and difficulties in accurately classifying emotion categories, particularly when emotions are expressed implicitly or ambiguously. To overcome these challenges, this study explores a fine-grained approach to span-level emotion-cause-category triplet extraction and introduces an innovative framework that leverages instruction tuning and data augmentation techniques based on large language models. The proposed method employs task-specific triplet extraction instructions and utilizes low-rank adaptation to fine-tune large language models, eliminating the necessity for intricate task-specific architectures. Furthermore, an LLM-based data augmentation strategy is developed to address data scarcity by guiding large language models in generating high-quality synthetic training data. Extensive experimental evaluations demonstrate that the proposed approach significantly outperforms existing baseline methods, achieving at least a 12.8 % improvement in span-level emotion-cause-category triplet extraction metrics. The results demonstrate the method’s effectiveness and robustness, offering a promising avenue for advancing research in emotion cause analysis.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Applied Soft Computing 工程技术-计算机：跨学科应用

CiteScore

15.80

自引率

6.90%

发文量

874

审稿时长

10.9 months

期刊介绍： Applied Soft Computing is an international journal promoting an integrated view of soft computing to solve real life problems.The focus is to publish the highest quality research in application and convergence of the areas of Fuzzy Logic, Neural Networks, Evolutionary Computing, Rough Sets and other similar techniques to address real world complexities. Applied Soft Computing is a rolling publication: articles are published as soon as the editor-in-chief has accepted them. Therefore, the web site will continuously be updated with new articles and the publication time will be short.