Chatbot Dialog Design for Improved Human Performance in Domain Knowledge Discovery

IF 4.4 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Human-Machine Systems Pub Date : 2025-01-07 DOI:10.1109/THMS.2024.3514742

Roland Oruche;Xiyao Cheng;Zian Zeng;Audrey Vazzana;MD Ashraful Goni;Bruce Wang Shibo;Sai Keerthana Goruganthu;Kerk Kee;Prasad Calyam

{"title":"Chatbot Dialog Design for Improved Human Performance in Domain Knowledge Discovery","authors":"Roland Oruche;Xiyao Cheng;Zian Zeng;Audrey Vazzana;MD Ashraful Goni;Bruce Wang Shibo;Sai Keerthana Goruganthu;Kerk Kee;Prasad Calyam","doi":"10.1109/THMS.2024.3514742","DOIUrl":null,"url":null,"abstract":"The advent of machine learning (ML) has led to the widespread adoption of developing task-oriented dialog systems for scientific applications (e.g., science gateways) where voluminous information sources are retrieved and curated for domain users. Yet, there still exists a challenge in designing chatbot dialog systems that achieve widespread diffusion among scientific communities. In this article, we propose a novel Vidura advisor design framework (VADF) to develop dialog system designs for information retrieval (IR) and question-answering (QA) tasks, while enabling the quantification of system utility based on human performance in diverse application environments. We adopt a socio-technical approach in our framework for designing dialog systems by utilizing domain expert feedback, which features a sparse retriever for enabling accurate responses in QA settings using linear interpolation smoothing. We apply our VADF for an exemplar science gateway, viz. KnowCOVID-19, to conduct experiments that demonstrate the utility of dialog systems based on IR and QA performance, application utility, and perceived adoption. Experimental results show our VADF approach significantly improves IR performance against retriever baselines (up to 5% increase) and QA performance against large language models (LLMs) such as ChatGPT (up to 43% increase) on scientific literature datasets. In addition, through a usability survey, we observe that measuring application utility and human performance when applying VADF to KnowCOVID-19 translates to an increase in perceived community adoption.","PeriodicalId":48916,"journal":{"name":"IEEE Transactions on Human-Machine Systems","volume":"55 2","pages":"207-222"},"PeriodicalIF":4.4000,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Human-Machine Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10832392/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

The advent of machine learning (ML) has led to the widespread adoption of developing task-oriented dialog systems for scientific applications (e.g., science gateways) where voluminous information sources are retrieved and curated for domain users. Yet, there still exists a challenge in designing chatbot dialog systems that achieve widespread diffusion among scientific communities. In this article, we propose a novel Vidura advisor design framework (VADF) to develop dialog system designs for information retrieval (IR) and question-answering (QA) tasks, while enabling the quantification of system utility based on human performance in diverse application environments. We adopt a socio-technical approach in our framework for designing dialog systems by utilizing domain expert feedback, which features a sparse retriever for enabling accurate responses in QA settings using linear interpolation smoothing. We apply our VADF for an exemplar science gateway, viz. KnowCOVID-19, to conduct experiments that demonstrate the utility of dialog systems based on IR and QA performance, application utility, and perceived adoption. Experimental results show our VADF approach significantly improves IR performance against retriever baselines (up to 5% increase) and QA performance against large language models (LLMs) such as ChatGPT (up to 43% increase) on scientific literature datasets. In addition, through a usability survey, we observe that measuring application utility and human performance when applying VADF to KnowCOVID-19 translates to an increase in perceived community adoption.

查看原文本刊更多论文

提高人类在领域知识发现中的表现的聊天机器人对话设计

机器学习（ML）的出现导致开发面向任务的对话系统被广泛采用，用于科学应用程序（例如科学网关），在这些应用程序中，为域用户检索和管理大量信息源。然而，在设计能够在科学界广泛传播的聊天机器人对话系统方面仍然存在着挑战。在本文中，我们提出了一个新的Vidura顾问设计框架（VADF）来开发用于信息检索（IR）和问答（QA）任务的对话系统设计，同时在不同的应用环境中实现基于人的性能的系统效用量化。我们在我们的框架中采用社会技术方法，通过利用领域专家反馈来设计对话系统，该框架具有稀疏检索器，可以使用线性插值平滑在QA设置中实现准确的响应。我们将我们的VADF应用于一个范例科学网关，即KnowCOVID-19，以进行实验，证明基于IR和QA性能、应用程序实用性和感知采用率的对话系统的实用性。实验结果表明，我们的VADF方法在科学文献数据集上显著提高了针对检索器基线的IR性能（提高了5%）和针对大型语言模型（llm）（如ChatGPT）的QA性能（提高了43%）。此外，通过一项可用性调查，我们观察到，在将VADF应用于KnowCOVID-19时，衡量应用程序的效用和人的性能可以转化为感知社区采用率的提高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Human-Machine Systems COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, CYBERNETICS

CiteScore

7.10

自引率

11.10%

发文量

136

期刊介绍： The scope of the IEEE Transactions on Human-Machine Systems includes the fields of human machine systems. It covers human systems and human organizational interactions including cognitive ergonomics, system test and evaluation, and human information processing concerns in systems and organizations.