对不起，我不明白:改进语音用户界面测试

Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering Pub Date : 2022-10-10 DOI:10.1145/3551349.3556934

Emanuela Guglielmi, Giovanni Rosa, Simone Scalabrino, G. Bavota, R. Oliveto

{"title":"对不起，我不明白:改进语音用户界面测试","authors":"Emanuela Guglielmi, Giovanni Rosa, Simone Scalabrino, G. Bavota, R. Oliveto","doi":"10.1145/3551349.3556934","DOIUrl":null,"url":null,"abstract":"Voice-based virtual assistants are becoming increasingly popular. Such systems provide frameworks to developers on which they can build their own apps. End-users can interact with such apps through a Voice User Interface (VUI), which allows to use natural language commands to perform actions. Testing such apps is far from trivial: The same command can be expressed in different ways. To support developers in testing VUIs, Deep Learning (DL)-based tools have been integrated in the development environments (e.g., the Alexa Developer Console, or ADC) to generate paraphrases for the commands (seed utterances) specified by the developers. Such tools, however, generate few paraphrases that do not always cover corner cases. In this paper, we introduce VUI-UPSET, a novel approach that aims at adapting chatbot-testing approaches to VUI-testing. Both systems, indeed, provide a similar natural-language-based interface to users. We conducted an empirical study to understand how VUI-UPSET compares to existing approaches in terms of (i) correctness of the generated paraphrases, and (ii) capability of revealing bugs. Multiple authors analyzed 5,872 generated paraphrases, with a total of 13,310 manual evaluations required for such a process. Our results show that, while the DL-based tool integrated in the ADC generates a higher percentage of meaningful paraphrases compared to VUI-UPSET, VUI-UPSET generates more bug-revealing paraphrases. This allows developers to test more thoroughly their apps at the cost of discarding a higher number of irrelevant paraphrases.","PeriodicalId":197939,"journal":{"name":"Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Sorry, I don’t Understand: Improving Voice User Interface Testing\",\"authors\":\"Emanuela Guglielmi, Giovanni Rosa, Simone Scalabrino, G. Bavota, R. Oliveto\",\"doi\":\"10.1145/3551349.3556934\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Voice-based virtual assistants are becoming increasingly popular. Such systems provide frameworks to developers on which they can build their own apps. End-users can interact with such apps through a Voice User Interface (VUI), which allows to use natural language commands to perform actions. Testing such apps is far from trivial: The same command can be expressed in different ways. To support developers in testing VUIs, Deep Learning (DL)-based tools have been integrated in the development environments (e.g., the Alexa Developer Console, or ADC) to generate paraphrases for the commands (seed utterances) specified by the developers. Such tools, however, generate few paraphrases that do not always cover corner cases. In this paper, we introduce VUI-UPSET, a novel approach that aims at adapting chatbot-testing approaches to VUI-testing. Both systems, indeed, provide a similar natural-language-based interface to users. We conducted an empirical study to understand how VUI-UPSET compares to existing approaches in terms of (i) correctness of the generated paraphrases, and (ii) capability of revealing bugs. Multiple authors analyzed 5,872 generated paraphrases, with a total of 13,310 manual evaluations required for such a process. Our results show that, while the DL-based tool integrated in the ADC generates a higher percentage of meaningful paraphrases compared to VUI-UPSET, VUI-UPSET generates more bug-revealing paraphrases. This allows developers to test more thoroughly their apps at the cost of discarding a higher number of irrelevant paraphrases.\",\"PeriodicalId\":197939,\"journal\":{\"name\":\"Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering\",\"volume\":\"46 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3551349.3556934\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3551349.3556934","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

基于语音的虚拟助手正变得越来越受欢迎。这样的系统为开发人员提供了框架，他们可以在这个框架上构建自己的应用程序。最终用户可以通过语音用户界面(VUI)与这些应用程序进行交互，该界面允许使用自然语言命令执行操作。测试这样的应用程序绝非易事:同一个命令可以用不同的方式表达。为了支持开发人员测试VUIs，基于深度学习(DL)的工具已经集成到开发环境中(例如，Alexa Developer Console或ADC)，以生成开发人员指定的命令(种子话语)的释义。然而，这样的工具产生的释义很少，它们并不总是涵盖极端情况。在本文中，我们介绍了VUI-UPSET，这是一种旨在将聊天机器人测试方法应用于vui测试的新方法。实际上，这两个系统都为用户提供了类似的基于自然语言的界面。我们进行了一项实证研究，以了解VUI-UPSET与现有方法在以下方面的比较:(i)生成释义的正确性，以及(ii)揭示bug的能力。多个作者分析了5,872个生成的释义，这样一个过程总共需要13,310个手动评估。我们的结果表明，与VUI-UPSET相比，集成在ADC中的基于dll的工具生成了更高比例的有意义的释义，而VUI-UPSET生成了更多揭示bug的释义。这让开发者能够更彻底地测试他们的应用，而代价是放弃更多不相关的释义。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Sorry, I don’t Understand: Improving Voice User Interface Testing

Voice-based virtual assistants are becoming increasingly popular. Such systems provide frameworks to developers on which they can build their own apps. End-users can interact with such apps through a Voice User Interface (VUI), which allows to use natural language commands to perform actions. Testing such apps is far from trivial: The same command can be expressed in different ways. To support developers in testing VUIs, Deep Learning (DL)-based tools have been integrated in the development environments (e.g., the Alexa Developer Console, or ADC) to generate paraphrases for the commands (seed utterances) specified by the developers. Such tools, however, generate few paraphrases that do not always cover corner cases. In this paper, we introduce VUI-UPSET, a novel approach that aims at adapting chatbot-testing approaches to VUI-testing. Both systems, indeed, provide a similar natural-language-based interface to users. We conducted an empirical study to understand how VUI-UPSET compares to existing approaches in terms of (i) correctness of the generated paraphrases, and (ii) capability of revealing bugs. Multiple authors analyzed 5,872 generated paraphrases, with a total of 13,310 manual evaluations required for such a process. Our results show that, while the DL-based tool integrated in the ADC generates a higher percentage of meaningful paraphrases compared to VUI-UPSET, VUI-UPSET generates more bug-revealing paraphrases. This allows developers to test more thoroughly their apps at the cost of discarding a higher number of irrelevant paraphrases.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering

自引率

0.00%

发文量