Hsin-Wei Wang, Bi-Cheng Yan, Yi-Cheng Wang, Berlin Chen
{"title":"利用语音、语义信息和n -最优假设的有效ASR纠错","authors":"Hsin-Wei Wang, Bi-Cheng Yan, Yi-Cheng Wang, Berlin Chen","doi":"10.23919/APSIPAASC55919.2022.9979951","DOIUrl":null,"url":null,"abstract":"Automatic speech recognition (ASR) has recently achieved remarkable success and reached human parity, thanks to the synergistic breakthroughs in neural model architectures and training algorithms. However, the performance of ASR in many real-world use cases is still far from perfect. There has been a surge of research interest in designing and developing feasible post-processing modules to improve recognition performance by refining ASR output sentences, which fall roughly into two categories. The first category of methods is ASR N-best hypothesis reranking. ASR N-best hypothesis reranking aims to find the oracle hypothesis with the lowest word error rate from a given N-best hypothesis list. The other category of methods take inspiration from, for example, Chinese spelling correction (CSC) or English spelling correction (ESC), seeking to detect and correct text-level errors of ASR output sentences. In this paper, we attempt to integrate the above two methods into the ASR error correction (AEC) module and explore the impact of different kinds of features on AEC. Empirical experiments on the widely-used AISHELL-l dataset show that our proposed method can significantly reduce the word error rate (WER) of the baseline ASR transcripts in relation to some top-of-line AEC methods, thereby demonstrating its effectiveness and practical feasibility.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"126 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Effective ASR Error Correction Leveraging Phonetic, Semantic Information and N-best hypotheses\",\"authors\":\"Hsin-Wei Wang, Bi-Cheng Yan, Yi-Cheng Wang, Berlin Chen\",\"doi\":\"10.23919/APSIPAASC55919.2022.9979951\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Automatic speech recognition (ASR) has recently achieved remarkable success and reached human parity, thanks to the synergistic breakthroughs in neural model architectures and training algorithms. However, the performance of ASR in many real-world use cases is still far from perfect. There has been a surge of research interest in designing and developing feasible post-processing modules to improve recognition performance by refining ASR output sentences, which fall roughly into two categories. The first category of methods is ASR N-best hypothesis reranking. ASR N-best hypothesis reranking aims to find the oracle hypothesis with the lowest word error rate from a given N-best hypothesis list. The other category of methods take inspiration from, for example, Chinese spelling correction (CSC) or English spelling correction (ESC), seeking to detect and correct text-level errors of ASR output sentences. In this paper, we attempt to integrate the above two methods into the ASR error correction (AEC) module and explore the impact of different kinds of features on AEC. Empirical experiments on the widely-used AISHELL-l dataset show that our proposed method can significantly reduce the word error rate (WER) of the baseline ASR transcripts in relation to some top-of-line AEC methods, thereby demonstrating its effectiveness and practical feasibility.\",\"PeriodicalId\":382967,\"journal\":{\"name\":\"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)\",\"volume\":\"126 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23919/APSIPAASC55919.2022.9979951\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/APSIPAASC55919.2022.9979951","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
摘要
由于神经模型架构和训练算法的协同突破,自动语音识别(ASR)最近取得了显著的成功,并达到了与人类相当的水平。然而,在许多实际用例中,ASR的性能仍然远非完美。设计和开发可行的后处理模块,通过改进ASR输出句子来提高识别性能,这方面的研究兴趣激增,大致分为两类。第一类方法是ASR n -最优假设重排序。ASR n -最佳假设重排序旨在从给定的n -最佳假设列表中找到单词错误率最低的oracle假设。另一类方法的灵感来自汉语拼写纠正(CSC)或英语拼写纠正(ESC),寻求检测和纠正ASR输出句子的文本级错误。在本文中,我们尝试将上述两种方法整合到ASR误差校正(AEC)模块中,并探讨不同类型的特征对AEC的影响。在广泛使用的ahell -l数据集上进行的实证实验表明,与一些顶级的AEC方法相比,我们提出的方法可以显著降低基线ASR转录本的单词错误率(WER),从而证明了该方法的有效性和实际可行性。
Effective ASR Error Correction Leveraging Phonetic, Semantic Information and N-best hypotheses
Automatic speech recognition (ASR) has recently achieved remarkable success and reached human parity, thanks to the synergistic breakthroughs in neural model architectures and training algorithms. However, the performance of ASR in many real-world use cases is still far from perfect. There has been a surge of research interest in designing and developing feasible post-processing modules to improve recognition performance by refining ASR output sentences, which fall roughly into two categories. The first category of methods is ASR N-best hypothesis reranking. ASR N-best hypothesis reranking aims to find the oracle hypothesis with the lowest word error rate from a given N-best hypothesis list. The other category of methods take inspiration from, for example, Chinese spelling correction (CSC) or English spelling correction (ESC), seeking to detect and correct text-level errors of ASR output sentences. In this paper, we attempt to integrate the above two methods into the ASR error correction (AEC) module and explore the impact of different kinds of features on AEC. Empirical experiments on the widely-used AISHELL-l dataset show that our proposed method can significantly reduce the word error rate (WER) of the baseline ASR transcripts in relation to some top-of-line AEC methods, thereby demonstrating its effectiveness and practical feasibility.