德语语音自动识别:详细的错误分析

2022 IEEE International Conference on Omni-layer Intelligent Systems (COINS) Pub Date : 2022-08-01 DOI:10.1109/COINS54846.2022.9854978

Johannes Wirth, R. Peinl

{"title":"德语语音自动识别:详细的错误分析","authors":"Johannes Wirth, R. Peinl","doi":"10.1109/COINS54846.2022.9854978","DOIUrl":null,"url":null,"abstract":"The amount of freely available systems for automatic speech recognition (ASR) based on neural networks is growing steadily, with equally increasingly reliable predictions. However, the evaluation of trained models is typically exclusively based on statistical metrics such as WER or CER, which do not provide any insight into the nature or impact of the errors produced when predicting transcripts from speech input. This work presents a selection of ASR model architectures that are pretrained on the German language and evaluates them on a benchmark of diverse test datasets. It identifies cross-architectural prediction errors, classifies those into categories and traces the sources of errors per category back into training data as well as other sources. Finally, it discusses solutions in order to create qualitatively better training datasets and more robust ASR systems.","PeriodicalId":187055,"journal":{"name":"2022 IEEE International Conference on Omni-layer Intelligent Systems (COINS)","volume":"83 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Automatic Speech Recognition in German: A Detailed Error Analysis\",\"authors\":\"Johannes Wirth, R. Peinl\",\"doi\":\"10.1109/COINS54846.2022.9854978\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The amount of freely available systems for automatic speech recognition (ASR) based on neural networks is growing steadily, with equally increasingly reliable predictions. However, the evaluation of trained models is typically exclusively based on statistical metrics such as WER or CER, which do not provide any insight into the nature or impact of the errors produced when predicting transcripts from speech input. This work presents a selection of ASR model architectures that are pretrained on the German language and evaluates them on a benchmark of diverse test datasets. It identifies cross-architectural prediction errors, classifies those into categories and traces the sources of errors per category back into training data as well as other sources. Finally, it discusses solutions in order to create qualitatively better training datasets and more robust ASR systems.\",\"PeriodicalId\":187055,\"journal\":{\"name\":\"2022 IEEE International Conference on Omni-layer Intelligent Systems (COINS)\",\"volume\":\"83 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Conference on Omni-layer Intelligent Systems (COINS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/COINS54846.2022.9854978\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Omni-layer Intelligent Systems (COINS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/COINS54846.2022.9854978","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

基于神经网络的自动语音识别(ASR)系统的数量正在稳步增长，预测的可靠性也在不断提高。然而，对训练模型的评估通常完全基于统计指标，如WER或CER，这些指标不能提供任何关于预测语音输入文本时产生的错误的性质或影响的见解。这项工作提出了在德语上进行预训练的ASR模型架构的选择，并在不同测试数据集的基准上对它们进行评估。它识别跨架构的预测错误，将这些错误分类，并将每个类别的错误来源追溯到训练数据以及其他来源。最后，它讨论了解决方案，以创建质量更好的训练数据集和更健壮的ASR系统。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Automatic Speech Recognition in German: A Detailed Error Analysis

The amount of freely available systems for automatic speech recognition (ASR) based on neural networks is growing steadily, with equally increasingly reliable predictions. However, the evaluation of trained models is typically exclusively based on statistical metrics such as WER or CER, which do not provide any insight into the nature or impact of the errors produced when predicting transcripts from speech input. This work presents a selection of ASR model architectures that are pretrained on the German language and evaluates them on a benchmark of diverse test datasets. It identifies cross-architectural prediction errors, classifies those into categories and traces the sources of errors per category back into training data as well as other sources. Finally, it discusses solutions in order to create qualitatively better training datasets and more robust ASR systems.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 IEEE International Conference on Omni-layer Intelligent Systems (COINS)

自引率

0.00%

发文量