{"title":"ASR - VLSP 2021:越南ASR任务的高效变压器方法","authors":"Toan Truong Tien","doi":"10.25073/2588-1086/vnucsce.325","DOIUrl":null,"url":null,"abstract":"Various techniques have been applied to enhance automatic speech recognition during the last few years. Reaching auspicious performance in natural language processing makes Transformer architecture becoming the de facto standard in numerous domains. This paper first presents our effort to collect a 3000-hour Vietnamese speech corpus. After that, we introduce the system used for VLSP 2021 ASR task 2, which is based on the Transformer. Our simple method achieves a favorable syllable error rate of 6.72% and gets second place on the private test. Experimental results indicate that the proposed approach dominates traditional methods with lower syllable error rates on general-domain evaluation sets. Finally, we show that applying Vietnamese word segmentation on the label does not improve the efficiency of the ASR system.","PeriodicalId":416488,"journal":{"name":"VNU Journal of Science: Computer Science and Communication Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"ASR - VLSP 2021: An Efficient Transformer-based Approach for Vietnamese ASR Task\",\"authors\":\"Toan Truong Tien\",\"doi\":\"10.25073/2588-1086/vnucsce.325\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Various techniques have been applied to enhance automatic speech recognition during the last few years. Reaching auspicious performance in natural language processing makes Transformer architecture becoming the de facto standard in numerous domains. This paper first presents our effort to collect a 3000-hour Vietnamese speech corpus. After that, we introduce the system used for VLSP 2021 ASR task 2, which is based on the Transformer. Our simple method achieves a favorable syllable error rate of 6.72% and gets second place on the private test. Experimental results indicate that the proposed approach dominates traditional methods with lower syllable error rates on general-domain evaluation sets. Finally, we show that applying Vietnamese word segmentation on the label does not improve the efficiency of the ASR system.\",\"PeriodicalId\":416488,\"journal\":{\"name\":\"VNU Journal of Science: Computer Science and Communication Engineering\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-06-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"VNU Journal of Science: Computer Science and Communication Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.25073/2588-1086/vnucsce.325\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"VNU Journal of Science: Computer Science and Communication Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.25073/2588-1086/vnucsce.325","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
ASR - VLSP 2021: An Efficient Transformer-based Approach for Vietnamese ASR Task
Various techniques have been applied to enhance automatic speech recognition during the last few years. Reaching auspicious performance in natural language processing makes Transformer architecture becoming the de facto standard in numerous domains. This paper first presents our effort to collect a 3000-hour Vietnamese speech corpus. After that, we introduce the system used for VLSP 2021 ASR task 2, which is based on the Transformer. Our simple method achieves a favorable syllable error rate of 6.72% and gets second place on the private test. Experimental results indicate that the proposed approach dominates traditional methods with lower syllable error rates on general-domain evaluation sets. Finally, we show that applying Vietnamese word segmentation on the label does not improve the efficiency of the ASR system.