Democratizing Chatbot Debugging: A Computational Framework for Evaluating and Explaining Inappropriate Chatbot Responses

Proceedings of the 5th International Conference on Conversational User Interfaces Pub Date : 2023-06-16 DOI:10.1145/3571884.3604308

Xu Han, Michelle X. Zhou, Yichen Wang, Wenxi Chen, Tom Yeh

{"title":"Democratizing Chatbot Debugging: A Computational Framework for Evaluating and Explaining Inappropriate Chatbot Responses","authors":"Xu Han, Michelle X. Zhou, Yichen Wang, Wenxi Chen, Tom Yeh","doi":"10.1145/3571884.3604308","DOIUrl":null,"url":null,"abstract":"Evaluating and understanding the inappropriateness of chatbot behaviors can be challenging, particularly for chatbot designers without technical backgrounds. To democratize the debugging process of chatbot misbehaviors for non-technical designers, we propose a framework that leverages dialogue act (DA) modeling to automate the evaluation and explanation of chatbot response inappropriateness. The framework first produces characterizations of context-aware DAs based on discourse analysis theory and real-world human-chatbot transcripts. It then automatically extracts features to identify the appropriateness level of a response and can explain the causes of the inappropriate response by examining the DA mismatch between the response and its conversational context. Using interview chatbots as a testbed, our framework achieves comparable classification accuracy with higher explainability and fewer computational resources than the deep learning baseline, making it the first step in utilizing DAs for chatbot response appropriateness evaluation and explanation.","PeriodicalId":127379,"journal":{"name":"Proceedings of the 5th International Conference on Conversational User Interfaces","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 5th International Conference on Conversational User Interfaces","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3571884.3604308","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Evaluating and understanding the inappropriateness of chatbot behaviors can be challenging, particularly for chatbot designers without technical backgrounds. To democratize the debugging process of chatbot misbehaviors for non-technical designers, we propose a framework that leverages dialogue act (DA) modeling to automate the evaluation and explanation of chatbot response inappropriateness. The framework first produces characterizations of context-aware DAs based on discourse analysis theory and real-world human-chatbot transcripts. It then automatically extracts features to identify the appropriateness level of a response and can explain the causes of the inappropriate response by examining the DA mismatch between the response and its conversational context. Using interview chatbots as a testbed, our framework achieves comparable classification accuracy with higher explainability and fewer computational resources than the deep learning baseline, making it the first step in utilizing DAs for chatbot response appropriateness evaluation and explanation.

查看原文本刊更多论文

民主化聊天机器人调试:评估和解释不适当的聊天机器人响应的计算框架

评估和理解聊天机器人行为的不适当性可能具有挑战性，特别是对于没有技术背景的聊天机器人设计师来说。为了使非技术设计人员对聊天机器人不当行为的调试过程民主化，我们提出了一个利用对话行为(DA)建模来自动评估和解释聊天机器人响应不当的框架。该框架首先基于话语分析理论和真实世界的人类聊天机器人文本生成了上下文感知的人工智能特征。然后，它自动提取特征以识别响应的适当级别，并通过检查响应与其会话上下文之间的数据处理不匹配来解释不适当响应的原因。使用访谈聊天机器人作为测试平台，我们的框架实现了与深度学习基线相当的分类精度，具有更高的可解释性和更少的计算资源，使其成为利用DAs进行聊天机器人响应适当性评估和解释的第一步。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 5th International Conference on Conversational User Interfaces

自引率

0.00%

发文量