A randomized prospective study of a hybrid rule- and data-driven virtual patient

IF 1.9 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Natural Language Engineering Pub Date : 2022-09-23 DOI:10.1017/s1351324922000420

Adam Stiff, Michael White, E. Fosler-Lussier, Lifeng Jin, Evan Jaffe, D. Danforth

{"title":"A randomized prospective study of a hybrid rule- and data-driven virtual patient","authors":"Adam Stiff, Michael White, E. Fosler-Lussier, Lifeng Jin, Evan Jaffe, D. Danforth","doi":"10.1017/s1351324922000420","DOIUrl":null,"url":null,"abstract":"\n Randomized prospective studies represent the gold standard for experimental design. In this paper, we present a randomized prospective study to validate the benefits of combining rule-based and data-driven natural language understanding methods in a virtual patient dialogue system. The system uses a rule-based pattern matching approach together with a machine learning (ML) approach in the form of a text-based convolutional neural network, combining the two methods with a simple logistic regression model to choose between their predictions for each dialogue turn. In an earlier, retrospective study, the hybrid system yielded a nearly 50% error reduction on our initial data, in part due to the differential performance between the two methods as a function of label frequency. Given these gains, and considering that our hybrid approach is unique among virtual patient systems, we compare the hybrid system to the rule-based system by itself in a randomized prospective study. We evaluate 110 unique medical student subjects interacting with the system over 5,296 conversation turns, to verify whether similar gains are observed in a deployed system. This prospective study broadly confirms the findings from the earlier one but also highlights important deficits in our training data. The hybrid approach still improves over either rule-based or ML approaches individually, even handling unseen classes with some success. However, we observe that live subjects ask more out-of-scope questions than expected. To better handle such questions, we investigate several modifications to the system combination component. These show significant overall accuracy improvements and modest F1 improvements on out-of-scope queries in an offline evaluation. We provide further analysis to characterize the difficulty of the out-of-scope problem that we have identified, as well as to suggest future improvements over the baseline we establish here.","PeriodicalId":49143,"journal":{"name":"Natural Language Engineering","volume":" ","pages":""},"PeriodicalIF":1.9000,"publicationDate":"2022-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Natural Language Engineering","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1017/s1351324922000420","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 3

Abstract

Randomized prospective studies represent the gold standard for experimental design. In this paper, we present a randomized prospective study to validate the benefits of combining rule-based and data-driven natural language understanding methods in a virtual patient dialogue system. The system uses a rule-based pattern matching approach together with a machine learning (ML) approach in the form of a text-based convolutional neural network, combining the two methods with a simple logistic regression model to choose between their predictions for each dialogue turn. In an earlier, retrospective study, the hybrid system yielded a nearly 50% error reduction on our initial data, in part due to the differential performance between the two methods as a function of label frequency. Given these gains, and considering that our hybrid approach is unique among virtual patient systems, we compare the hybrid system to the rule-based system by itself in a randomized prospective study. We evaluate 110 unique medical student subjects interacting with the system over 5,296 conversation turns, to verify whether similar gains are observed in a deployed system. This prospective study broadly confirms the findings from the earlier one but also highlights important deficits in our training data. The hybrid approach still improves over either rule-based or ML approaches individually, even handling unseen classes with some success. However, we observe that live subjects ask more out-of-scope questions than expected. To better handle such questions, we investigate several modifications to the system combination component. These show significant overall accuracy improvements and modest F1 improvements on out-of-scope queries in an offline evaluation. We provide further analysis to characterize the difficulty of the out-of-scope problem that we have identified, as well as to suggest future improvements over the baseline we establish here.

查看原文本刊更多论文

一项基于规则和数据驱动的混合虚拟患者的随机前瞻性研究

随机前瞻性研究是实验设计的黄金标准。在本文中，我们提出了一项随机前瞻性研究，以验证在虚拟患者对话系统中结合基于规则和数据驱动的自然语言理解方法的好处。该系统使用基于规则的模式匹配方法和基于文本的卷积神经网络形式的机器学习（ML）方法，将这两种方法与简单的逻辑回归模型相结合，在它们对每个对话回合的预测之间进行选择。在早期的回顾性研究中，混合系统在我们的初始数据上减少了近50%的误差，部分原因是两种方法之间的性能与标签频率有关。鉴于这些优势，并考虑到我们的混合方法在虚拟患者系统中是独一无二的，我们在一项随机前瞻性研究中将混合系统与基于规则的系统本身进行了比较。我们评估了110名独特的医学生受试者在5296次对话中与系统互动，以验证在部署的系统中是否观察到类似的收获。这项前瞻性研究广泛地证实了早期研究的发现，但也强调了我们训练数据中的重要缺陷。混合方法仍然比基于规则或ML方法单独改进，甚至在处理看不见的类时也取得了一些成功。然而，我们观察到，现场受试者提出的超出范围的问题比预期的要多。为了更好地处理这些问题，我们研究了对系统组合组件的几种修改。在离线评估中，这些显示了显著的总体准确性改进和对范围外查询的适度F1改进。我们提供了进一步的分析，以描述我们已经确定的范围外问题的困难，并建议在我们这里建立的基线基础上进行未来的改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Natural Language Engineering COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-

CiteScore

5.90

自引率

12.00%

发文量

审稿时长

>12 weeks

期刊介绍： Natural Language Engineering meets the needs of professionals and researchers working in all areas of computerised language processing, whether from the perspective of theoretical or descriptive linguistics, lexicology, computer science or engineering. Its aim is to bridge the gap between traditional computational linguistics research and the implementation of practical applications with potential real-world use. As well as publishing research articles on a broad range of topics - from text analysis, machine translation, information retrieval and speech analysis and generation to integrated systems and multi modal interfaces - it also publishes special issues on specific areas and technologies within these topics, an industry watch column and book reviews.