差分测试能改善自动语音识别系统吗?

2021 IEEE International Conference on Software Maintenance and Evolution (ICSME) Pub Date : 2021-09-01 DOI:10.26226/morressier.613b5418842293c031b5b5ef

Muhammad Hilmi Asyrofi, Zhou Yang, Jieke Shi, Chu Wei Quan, David Lo

{"title":"差分测试能改善自动语音识别系统吗?","authors":"Muhammad Hilmi Asyrofi, Zhou Yang, Jieke Shi, Chu Wei Quan, David Lo","doi":"10.26226/morressier.613b5418842293c031b5b5ef","DOIUrl":null,"url":null,"abstract":"Due to the widespread adoption of Automatic Speech Recognition (ASR) systems in many critical domains, ensuring the quality of recognized transcriptions is of great importance. A recent work, CrossASR++, can automatically uncover many failures in ASR systems by taking advantage of the differential testing technique. It employs a Text-To-Speech (TTS) system to synthesize audios from texts and then reveals failed test cases by feeding them to multiple ASR systems for cross-referencing. However, no prior work tries to utilize the generated test cases to enhance the quality of ASR systems. In this paper, we explore the subsequent improvements brought by leveraging these test cases from two aspects, which we collectively refer to as a novel idea, evolutionary differential testing. On the one hand, we fine-tune a target ASR system on the corresponding test cases generated for it. On the other hand, we fine-tune a cross-referenced ASR system inside CrossASR++, with the hope to boost CrossASR++'s performance in uncovering more failed test cases. Our experiment results empirically show that the above methods to leverage the test cases can substantially improve both the target ASR system and CrossASR++ itself. After fine-tuning, the number of failed test cases uncovered decreases by 25.81% and the word error rate of the improved target ASR system drops by 45.81%. Moreover, by evolving just one cross-referenced ASR system, CrossASR++ can find 5.70%, 7.25%, 3.93%, and 1.52% more failed test cases for 4 target ASR systems, respectively.","PeriodicalId":205629,"journal":{"name":"2021 IEEE International Conference on Software Maintenance and Evolution (ICSME)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Can Differential Testing Improve Automatic Speech Recognition Systems?\",\"authors\":\"Muhammad Hilmi Asyrofi, Zhou Yang, Jieke Shi, Chu Wei Quan, David Lo\",\"doi\":\"10.26226/morressier.613b5418842293c031b5b5ef\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Due to the widespread adoption of Automatic Speech Recognition (ASR) systems in many critical domains, ensuring the quality of recognized transcriptions is of great importance. A recent work, CrossASR++, can automatically uncover many failures in ASR systems by taking advantage of the differential testing technique. It employs a Text-To-Speech (TTS) system to synthesize audios from texts and then reveals failed test cases by feeding them to multiple ASR systems for cross-referencing. However, no prior work tries to utilize the generated test cases to enhance the quality of ASR systems. In this paper, we explore the subsequent improvements brought by leveraging these test cases from two aspects, which we collectively refer to as a novel idea, evolutionary differential testing. On the one hand, we fine-tune a target ASR system on the corresponding test cases generated for it. On the other hand, we fine-tune a cross-referenced ASR system inside CrossASR++, with the hope to boost CrossASR++'s performance in uncovering more failed test cases. Our experiment results empirically show that the above methods to leverage the test cases can substantially improve both the target ASR system and CrossASR++ itself. After fine-tuning, the number of failed test cases uncovered decreases by 25.81% and the word error rate of the improved target ASR system drops by 45.81%. Moreover, by evolving just one cross-referenced ASR system, CrossASR++ can find 5.70%, 7.25%, 3.93%, and 1.52% more failed test cases for 4 target ASR systems, respectively.\",\"PeriodicalId\":205629,\"journal\":{\"name\":\"2021 IEEE International Conference on Software Maintenance and Evolution (ICSME)\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE International Conference on Software Maintenance and Evolution (ICSME)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.26226/morressier.613b5418842293c031b5b5ef\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Software Maintenance and Evolution (ICSME)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.26226/morressier.613b5418842293c031b5b5ef","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

摘要

由于自动语音识别(ASR)系统在许多关键领域的广泛采用，确保识别转录的质量非常重要。最近的一项工作，crossasr++，可以通过利用差分测试技术自动发现ASR系统中的许多故障。它采用文本到语音(TTS)系统从文本合成音频，然后通过将它们提供给多个ASR系统进行交叉引用来揭示失败的测试用例。然而，没有先前的工作试图利用生成的测试用例来提高ASR系统的质量。在本文中，我们从两个方面探讨了利用这些测试用例所带来的后续改进，我们将其统称为一个新颖的思想，即进化差异测试。一方面，我们在为其生成的相应测试用例上对目标ASR系统进行微调。另一方面，我们在crossasr++中微调了一个交叉引用的ASR系统，希望通过发现更多失败的测试用例来提高crossasr++的性能。我们的实验结果经验地表明，上述利用测试用例的方法可以极大地改进目标ASR系统和crossasr++本身。经过微调后，发现的失败测试用例数量减少了25.81%，改进后的目标ASR系统的单词错误率下降了45.81%。此外，通过发展一个交叉引用的ASR系统，crossasr++可以分别为4个目标ASR系统发现5.70%，7.25%，3.93%和1.52%的失败测试用例。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Can Differential Testing Improve Automatic Speech Recognition Systems?

Due to the widespread adoption of Automatic Speech Recognition (ASR) systems in many critical domains, ensuring the quality of recognized transcriptions is of great importance. A recent work, CrossASR++, can automatically uncover many failures in ASR systems by taking advantage of the differential testing technique. It employs a Text-To-Speech (TTS) system to synthesize audios from texts and then reveals failed test cases by feeding them to multiple ASR systems for cross-referencing. However, no prior work tries to utilize the generated test cases to enhance the quality of ASR systems. In this paper, we explore the subsequent improvements brought by leveraging these test cases from two aspects, which we collectively refer to as a novel idea, evolutionary differential testing. On the one hand, we fine-tune a target ASR system on the corresponding test cases generated for it. On the other hand, we fine-tune a cross-referenced ASR system inside CrossASR++, with the hope to boost CrossASR++'s performance in uncovering more failed test cases. Our experiment results empirically show that the above methods to leverage the test cases can substantially improve both the target ASR system and CrossASR++ itself. After fine-tuning, the number of failed test cases uncovered decreases by 25.81% and the word error rate of the improved target ASR system drops by 45.81%. Moreover, by evolving just one cross-referenced ASR system, CrossASR++ can find 5.70%, 7.25%, 3.93%, and 1.52% more failed test cases for 4 target ASR systems, respectively.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 IEEE International Conference on Software Maintenance and Evolution (ICSME)

自引率

0.00%

发文量