基于多表排序损失的特征敏感负样本对话响应一致性评价

IF 8 2区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

Engineering Applications of Artificial Intelligence Pub Date : 2025-03-26 DOI:10.1016/j.engappai.2025.110609

YeongJun Hwang, Dongjun Kang, JinYeong Bak

{"title":"基于多表排序损失的特征敏感负样本对话响应一致性评价","authors":"YeongJun Hwang, Dongjun Kang, JinYeong Bak","doi":"10.1016/j.engappai.2025.110609","DOIUrl":null,"url":null,"abstract":"<div><div>Automatic evaluation of dialogue coherency is crucial for developing high-quality dialogue systems. However, traditional evaluation metrics such as Bilingual Evaluation Understudy (BLEU) and Recall-Oriented Understudy for Gisting Evaluation (ROUGE) have limitations when it comes to assessing diverse and creative responses because they heavily rely on reference responses. For learnable metrics which utilize contrastive learning, challenges are encountered due to the use of randomly selected negative samples that do not reflect conversational features (i.e. topic, emotion, intention) and the lack of granularity in assessing response appropriateness. To address these limitations, we propose the Feature sensitive Multi-Listwise Ranking (FMListR) response coherency evaluation model. This model aims to evaluate dialogue coherency in degrees while considering conversational sensitive features. This approach involves sampling feature-sensitive responses that share conversational features with ground truth responses and utilizing them as hard negative samples. The model is trained using Multi-Listwise Ranking (MListR) loss, which is designed to learn the ranking between negative samples and identify response features. The experimental results demonstrate that Feature sensitive Multi-Listwise Ranking exhibits stronger correlations with human judgment compared to other response coherency evaluation metrics. By considering conversational features and training the model using a specialized loss function, FMListR provides a more robust and accurate evaluation of dialogue coherency.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"150 ","pages":"Article 110609"},"PeriodicalIF":8.0000,"publicationDate":"2025-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Dialogue response coherency evaluation with feature sensitive negative sample using multi list-wise ranking loss\",\"authors\":\"YeongJun Hwang, Dongjun Kang, JinYeong Bak\",\"doi\":\"10.1016/j.engappai.2025.110609\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Automatic evaluation of dialogue coherency is crucial for developing high-quality dialogue systems. However, traditional evaluation metrics such as Bilingual Evaluation Understudy (BLEU) and Recall-Oriented Understudy for Gisting Evaluation (ROUGE) have limitations when it comes to assessing diverse and creative responses because they heavily rely on reference responses. For learnable metrics which utilize contrastive learning, challenges are encountered due to the use of randomly selected negative samples that do not reflect conversational features (i.e. topic, emotion, intention) and the lack of granularity in assessing response appropriateness. To address these limitations, we propose the Feature sensitive Multi-Listwise Ranking (FMListR) response coherency evaluation model. This model aims to evaluate dialogue coherency in degrees while considering conversational sensitive features. This approach involves sampling feature-sensitive responses that share conversational features with ground truth responses and utilizing them as hard negative samples. The model is trained using Multi-Listwise Ranking (MListR) loss, which is designed to learn the ranking between negative samples and identify response features. The experimental results demonstrate that Feature sensitive Multi-Listwise Ranking exhibits stronger correlations with human judgment compared to other response coherency evaluation metrics. By considering conversational features and training the model using a specialized loss function, FMListR provides a more robust and accurate evaluation of dialogue coherency.</div></div>\",\"PeriodicalId\":50523,\"journal\":{\"name\":\"Engineering Applications of Artificial Intelligence\",\"volume\":\"150 \",\"pages\":\"Article 110609\"},\"PeriodicalIF\":8.0000,\"publicationDate\":\"2025-03-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Engineering Applications of Artificial Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0952197625006098\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Applications of Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0952197625006098","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

对话一致性的自动评价对于开发高质量的对话系统至关重要。然而，传统的评价指标，如双语评价替代研究（BLEU）和以回忆为导向的注册评价替代研究（ROUGE），在评估多样性和创造性反应时存在局限性，因为它们严重依赖参考反应。对于利用对比学习的可学习指标，由于使用随机选择的负样本而不反映会话特征（即主题，情感，意图）以及评估响应适当性缺乏粒度，因此遇到了挑战。为了解决这些限制，我们提出了特征敏感的多列表排序（FMListR）响应一致性评估模型。该模型的目的是在考虑对话敏感特征的同时，对对话的连贯程度进行评估。这种方法包括对特征敏感的响应进行采样，这些响应与基础真值响应共享会话特征，并将它们用作硬负样本。该模型使用Multi-Listwise Ranking （MListR）损失进行训练，该方法旨在学习负样本之间的排序并识别响应特征。实验结果表明，与其他响应一致性评价指标相比，特征敏感多列表排序与人类判断的相关性更强。通过考虑会话特征并使用专门的损失函数训练模型，FMListR提供了更稳健和准确的对话一致性评估。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Dialogue response coherency evaluation with feature sensitive negative sample using multi list-wise ranking loss

Automatic evaluation of dialogue coherency is crucial for developing high-quality dialogue systems. However, traditional evaluation metrics such as Bilingual Evaluation Understudy (BLEU) and Recall-Oriented Understudy for Gisting Evaluation (ROUGE) have limitations when it comes to assessing diverse and creative responses because they heavily rely on reference responses. For learnable metrics which utilize contrastive learning, challenges are encountered due to the use of randomly selected negative samples that do not reflect conversational features (i.e. topic, emotion, intention) and the lack of granularity in assessing response appropriateness. To address these limitations, we propose the Feature sensitive Multi-Listwise Ranking (FMListR) response coherency evaluation model. This model aims to evaluate dialogue coherency in degrees while considering conversational sensitive features. This approach involves sampling feature-sensitive responses that share conversational features with ground truth responses and utilizing them as hard negative samples. The model is trained using Multi-Listwise Ranking (MListR) loss, which is designed to learn the ranking between negative samples and identify response features. The experimental results demonstrate that Feature sensitive Multi-Listwise Ranking exhibits stronger correlations with human judgment compared to other response coherency evaluation metrics. By considering conversational features and training the model using a specialized loss function, FMListR provides a more robust and accurate evaluation of dialogue coherency.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Engineering Applications of Artificial Intelligence 工程技术-工程：电子与电气

CiteScore

9.60

自引率

10.00%

发文量

505

审稿时长

68 days

期刊介绍： Artificial Intelligence (AI) is pivotal in driving the fourth industrial revolution, witnessing remarkable advancements across various machine learning methodologies. AI techniques have become indispensable tools for practicing engineers, enabling them to tackle previously insurmountable challenges. Engineering Applications of Artificial Intelligence serves as a global platform for the swift dissemination of research elucidating the practical application of AI methods across all engineering disciplines. Submitted papers are expected to present novel aspects of AI utilized in real-world engineering applications, validated using publicly available datasets to ensure the replicability of research outcomes. Join us in exploring the transformative potential of AI in engineering.