H. P. T. Thu, B. N. Thai, V. H. Nguyen, Quoc Truong Do, Luong Chi Mai, Huyen Thi Minh Nguyen
{"title":"基于转换和块合并的越南语语音自动识别的大写恢复","authors":"H. P. T. Thu, B. N. Thai, V. H. Nguyen, Quoc Truong Do, Luong Chi Mai, Huyen Thi Minh Nguyen","doi":"10.1109/KSE.2019.8919342","DOIUrl":null,"url":null,"abstract":"In the last few years, Automatic Speech Recognition (ASR) systems for Vietnamese are utilized in various applications with exceptional results. Nevertheless, such ASR output still contains limitations such as the absence of punctuation, capitalization and standardize numeric data. These shortcomings cause difficulties for readers to understand context efficiently and for Natural Language Processing (NLP) tasks to be well-performed. Capitalization is one of the most critical factors to enhance human readability, parsing, and Named Entity Recognition (NER). Additionally, Vietnamese ASR output has its own features comparing to English such as lisp words, local words, compound words, and homophone. In this paper, we propose a method to Recover Capitalization for long-speech ASR transcription of Vietnamese using Transformer models and chunk merging. Furthermore, we perform decoding in parallel while improving the prediction accuracy.","PeriodicalId":439841,"journal":{"name":"2019 11th International Conference on Knowledge and Systems Engineering (KSE)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Recovering Capitalization for Automatic Speech Recognition of Vietnamese using Transformer and Chunk Merging\",\"authors\":\"H. P. T. Thu, B. N. Thai, V. H. Nguyen, Quoc Truong Do, Luong Chi Mai, Huyen Thi Minh Nguyen\",\"doi\":\"10.1109/KSE.2019.8919342\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the last few years, Automatic Speech Recognition (ASR) systems for Vietnamese are utilized in various applications with exceptional results. Nevertheless, such ASR output still contains limitations such as the absence of punctuation, capitalization and standardize numeric data. These shortcomings cause difficulties for readers to understand context efficiently and for Natural Language Processing (NLP) tasks to be well-performed. Capitalization is one of the most critical factors to enhance human readability, parsing, and Named Entity Recognition (NER). Additionally, Vietnamese ASR output has its own features comparing to English such as lisp words, local words, compound words, and homophone. In this paper, we propose a method to Recover Capitalization for long-speech ASR transcription of Vietnamese using Transformer models and chunk merging. Furthermore, we perform decoding in parallel while improving the prediction accuracy.\",\"PeriodicalId\":439841,\"journal\":{\"name\":\"2019 11th International Conference on Knowledge and Systems Engineering (KSE)\",\"volume\":\"37 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 11th International Conference on Knowledge and Systems Engineering (KSE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/KSE.2019.8919342\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 11th International Conference on Knowledge and Systems Engineering (KSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/KSE.2019.8919342","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Recovering Capitalization for Automatic Speech Recognition of Vietnamese using Transformer and Chunk Merging
In the last few years, Automatic Speech Recognition (ASR) systems for Vietnamese are utilized in various applications with exceptional results. Nevertheless, such ASR output still contains limitations such as the absence of punctuation, capitalization and standardize numeric data. These shortcomings cause difficulties for readers to understand context efficiently and for Natural Language Processing (NLP) tasks to be well-performed. Capitalization is one of the most critical factors to enhance human readability, parsing, and Named Entity Recognition (NER). Additionally, Vietnamese ASR output has its own features comparing to English such as lisp words, local words, compound words, and homophone. In this paper, we propose a method to Recover Capitalization for long-speech ASR transcription of Vietnamese using Transformer models and chunk merging. Furthermore, we perform decoding in parallel while improving the prediction accuracy.