Phạm Việt Thành, Le Duc Cuong, Dao Dang Huy, Luu Duc Thanh, Nguyen Duc Tan, Dang Trung Duc Anh, Nguyen Thi Thu Trang
{"title":"ASR - VLSP 2021: Semi-supervised Ensemble Model for Vietnamese Automatic Speech Recognition","authors":"Phạm Việt Thành, Le Duc Cuong, Dao Dang Huy, Luu Duc Thanh, Nguyen Duc Tan, Dang Trung Duc Anh, Nguyen Thi Thu Trang","doi":"10.25073/2588-1086/vnucsce.332","DOIUrl":"https://doi.org/10.25073/2588-1086/vnucsce.332","url":null,"abstract":"Automatic speech recognition (ASR) is gaining huge advances with the arrival of End-to-End architectures. Semi-supervised learning methods, which can utilize unlabeled data, have largely contributed to the success of ASR systems, giving them the ability to surpass human performance. However, most of the researches focus on developing these techniques for English speech recognition, which raises concern about their performance in other languages, especially in low-resource scenarios. In this paper, we aim at proposing a Vietnamese ASR system for participating in the VLSP 2021 Automatic Speech Recognition Shared Task. The system is based on the Wav2vec 2.0 framework, along with the application of self-training and several data augmentation techniques. Experimental results show that on the ASR-T1 test set of the shared task, our proposed model achieved a remarkable result, ranked as the second place with a Syllable Error Rate (SyER) of 11.08%.","PeriodicalId":416488,"journal":{"name":"VNU Journal of Science: Computer Science and Communication Engineering","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126305417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"NER - VLSP 2021: A Span-Based Model for Named Entity Recognition Task with Co-teaching+ Training Strategy","authors":"Pham Hoai Phu Thinh, Vu Tran Duy, Do Tran Anh Duc","doi":"10.25073/2588-1086/vnucsce.328","DOIUrl":"https://doi.org/10.25073/2588-1086/vnucsce.328","url":null,"abstract":"Named entities containing other named entities inside are referred to as nested entities, which commonly exist in news articles and other documents. However, most studies in the field of Vietnamese named entity recognition entirely ignore nested entities. In this report, we describe our system at VLSP 2021 evaluation campaign, adopting the technique from dependency parsing to tackle the problem of nested entities. We also apply Coteaching+ technique to enhance the overall performance and propose an ensemble algorithm to combine predictions. Experimental results show that the ensemble method achieves the best F1 score on the test set at VLSP 2021.","PeriodicalId":416488,"journal":{"name":"VNU Journal of Science: Computer Science and Communication Engineering","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121556302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"VLSP 2021 - ASR Challenge for Vietnamese Automatic Speech Recognition","authors":"Van Hai Do","doi":"10.25073/2588-1086/vnucsce.356","DOIUrl":"https://doi.org/10.25073/2588-1086/vnucsce.356","url":null,"abstract":"Recently, Vietnamese speech recognition has been attracted by various research groups in both academics and industry. This paper presents a Vietnamese automatic speech recognition challenge for the eighth annual workshop on Vietnamese Language and Speech Processing (VLSP 2021). There are two sub-tasks in the challenge. The first task is ASR-Task1 focusing on a full pipeline development of the ASR model from scratch with both labeled and unlabeled training data provided by the organizer. The second task is ASR-Task2 focusing on spontaneous speech in different real scenarios e.g., meeting conversation, lecture speech. In the ASR-Task2, participants can use all available data sources to develop their models without any limitations. The quality of the models is evaluated by the Syllable Error Rate (SyER) metric.","PeriodicalId":416488,"journal":{"name":"VNU Journal of Science: Computer Science and Communication Engineering","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128879233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"VLSP 2021 - TTS Challenge: Vietnamese Spontaneous Speech Synthesis","authors":"Nguyen Thi Thu Trang, H. Nguyen","doi":"10.25073/2588-1086/vnucsce.358","DOIUrl":"https://doi.org/10.25073/2588-1086/vnucsce.358","url":null,"abstract":"Text-To-Speech (TTS) was one of nine shared tasks in the eighth annual international VLSP 2021 workshop. All three previous TTS shared tasks were conducted on reading datasets. However, the synthetic voices were not natural enough for spoken dialog systems where the computer must talk to the human in a conversation. Speech datasets recorded in a spontaneous environment help a TTS system to produce more natural voices in speaking style, speaking rate, intonation... Therefore, in this shared task, participants were asked to build a TTS system from a spontaneous speech dataset. This 7.5-hour dataset was collected from a channel of a famous youtuber \"Giang ơi...\"and then pre-processed to build utterances and their corresponding texts. Main challenges at this task this year were: (i) inconsistency in speaking rate, intensity, stress and prosody across the dataset, (ii) background noises or mixed with other voices, and (iii) inaccurate transcripts. A total of 43 teams registered to participate in this shared task, and finally, 8 submissions were evaluated online with perceptual tests. Two types of perceptual tests were conducted: (i) MOS test for naturalness and (ii) SUS (Semantically Unpredictable Sentences) test for intelligibility. The best SUS intelligibility TTS system had a syllable error rate of 15%, while the best MOS score on dialog utterances was 3.98 over 4.54 points on a 5-point MOS scale. The prosody and speaking rate of synthetic voices were similar to the natural one. However, there were still some distorted segments and background noises in most of TTS systems, a half of which had a syllable error rate of at least 30%.","PeriodicalId":416488,"journal":{"name":"VNU Journal of Science: Computer Science and Communication Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130015394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nguyen Quoc Bao, Le Ba Hoai, N. Hoc, Dam Ba Quyen, Nguyen Thu Phuong
{"title":"TTS - VLSP 2021: Development of Smartcall Vietnamese Text-to-Speech","authors":"Nguyen Quoc Bao, Le Ba Hoai, N. Hoc, Dam Ba Quyen, Nguyen Thu Phuong","doi":"10.25073/2588-1086/vnucsce.348","DOIUrl":"https://doi.org/10.25073/2588-1086/vnucsce.348","url":null,"abstract":"Recent advances in deep learning facilitate the development of end-to-end Vietnamese text-to-speech (TTS) systems with high intelligibility and naturalness in the presence of a clean training corpus. Given a rich source of audio recording data on the Internet, TTS has excellent potential for growth if it can take advantage of this data source. However, the quality of these data is often not sufficient for training TTS systems, e.g., noisy audio. In this paper, we propose an approach that preprocesses noisy found data on the Internet and trains a high-quality TTS model on the processed data. The VLSP-provided training data was thoroughly preprocessed using 1) voice activity detection, 2) automatic speech recognition-based prosodic punctuation insertion, and 3) Spleeter, source separation tool, for separating voice from background music. Moreover, we utilize a state-of-the-art TTS system that takes advantage of the Conditional Variational Autoencoder with the Adversarial Learning model. Our experiment showed that the proposed TTS system trained on the preprocessed data achieved a good result on the provided noisy dataset.","PeriodicalId":416488,"journal":{"name":"VNU Journal of Science: Computer Science and Communication Engineering","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117299463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ASR - VLSP 2021: An Efficient Transformer-based Approach for Vietnamese ASR Task","authors":"Toan Truong Tien","doi":"10.25073/2588-1086/vnucsce.325","DOIUrl":"https://doi.org/10.25073/2588-1086/vnucsce.325","url":null,"abstract":"Various techniques have been applied to enhance automatic speech recognition during the last few years. Reaching auspicious performance in natural language processing makes Transformer architecture becoming the de facto standard in numerous domains. This paper first presents our effort to collect a 3000-hour Vietnamese speech corpus. After that, we introduce the system used for VLSP 2021 ASR task 2, which is based on the Transformer. Our simple method achieves a favorable syllable error rate of 6.72% and gets second place on the private test. Experimental results indicate that the proposed approach dominates traditional methods with lower syllable error rates on general-domain evaluation sets. Finally, we show that applying Vietnamese word segmentation on the label does not improve the efficiency of the ASR system.","PeriodicalId":416488,"journal":{"name":"VNU Journal of Science: Computer Science and Communication Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116065865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T. Thang, Dang Dinh Son, Le Dang Linh, Dang Xuan Vuong, Duong Quang Tien
{"title":"ASR - VLSP 2021: Automatic Speech Recognition with Blank Label Re-weighting","authors":"T. Thang, Dang Dinh Son, Le Dang Linh, Dang Xuan Vuong, Duong Quang Tien","doi":"10.25073/2588-1086/vnucsce.321","DOIUrl":"https://doi.org/10.25073/2588-1086/vnucsce.321","url":null,"abstract":"End-to-end models have significant potential in most languages and recently proved the robustness in ASR tasks. Many robust architectures are proposed, and among many techniques, Recurrent Neural Network - Transducer (RNN-T) shows remarkable success. However, with background noise or reverb in spontaneous speech, this architecture generally suffers from high deletion error problems. For this reason, we propose the blank label re-weighting technique to improve the state-of-the-art Conformer transducer model. Our proposed system adopts the Stochastic Weight Averaging approach, stabilizing the training process. Our work achieved the first rank with a 4.17% of word error rate in Task 2 of the VLSP 2021 Competition. \u0000 ","PeriodicalId":416488,"journal":{"name":"VNU Journal of Science: Computer Science and Communication Engineering","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130365030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hung Van Dinh, Tuan Van Mai, Quyen B. Dam, Bao Quoc Nguyen
{"title":"SV - VLSP2021: The Smartcall - ITS’s Systems","authors":"Hung Van Dinh, Tuan Van Mai, Quyen B. Dam, Bao Quoc Nguyen","doi":"10.25073/2588-1086/vnucsce.339","DOIUrl":"https://doi.org/10.25073/2588-1086/vnucsce.339","url":null,"abstract":"This paper presents the Smartcall - ITS’s systems submitted to the Vietnamese Language and Speech Processing, Speaker Verification (SV) task. The challenge consists of two tasks focusing on the development of SV models with limited data and testing the robustness of SV systems. In both tasks, we used various pre-trained speaker embedding models with different architectures: TDNN, Resnet34. After a specific fine-tuning strategy with data from the organiser, our system achieved the first rank for both two tasks with the Equal Error Rate respectively are 1.755%, 1.95%. In this paper, we describe our system developed for the booth two tasks in the VLSP2021 Speaker Verification shared-task.","PeriodicalId":416488,"journal":{"name":"VNU Journal of Science: Computer Science and Communication Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130664150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vi Thanh Dat, Phạm Việt Thành, Nguyen Thi Thu Trang
{"title":"VLSP 2021 - SV challenge: Vietnamese Speaker Verification in Noisy Environments","authors":"Vi Thanh Dat, Phạm Việt Thành, Nguyen Thi Thu Trang","doi":"10.25073/2588-1086/vnucsce.333","DOIUrl":"https://doi.org/10.25073/2588-1086/vnucsce.333","url":null,"abstract":"\u0000 \u0000 \u0000 \u0000 \u0000 \u0000The VLSP 2021 is the eighth annual international workshop whose campaign was organized at the University of Information Technology, Vietnam National University, Ho Chi Minh City (UIT-VNU-HCM). This was the first time we organized the Speaker Verification shared task with two subtasks SV-T1 and SV-T2. SV-T1 focuses on the development of SV models with limited data, and SV-T2 focuses on testing the capability and the robustness of SV systems. With the aim to boost the development of robust models, we collected, processed, and published a speaker dataset in noisy environments containing 50 hours of speech and more than 1,300 speaker identities. A total of 39 teams registered to participate in this shared task, 15 teams received the dataset, and finally, 7 teams submitted final solutions. The best solution leveraged English pre-trained models and achieved 1.755% and 1.950% Equal Error Rate for SV-T1 and SV-T2 respectively. \u0000 \u0000 \u0000 \u0000 \u0000 \u0000","PeriodicalId":416488,"journal":{"name":"VNU Journal of Science: Computer Science and Communication Engineering","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126170457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pham-Khoi Dong, Hung K. Nguyen, F. Hussin, Xuan-Tu Tran
{"title":"Ultra-High-Throughput Multi-Core AES Encryption Hardware Architecture","authors":"Pham-Khoi Dong, Hung K. Nguyen, F. Hussin, Xuan-Tu Tran","doi":"10.25073/2588-1086/vnucsce.290","DOIUrl":"https://doi.org/10.25073/2588-1086/vnucsce.290","url":null,"abstract":"Security issues in high-speed data transfer between devices are always a big challenge. On the other hand, new data transfer standards such as IEEE P802.3bs 2017 stipulate the maximum data rate up to 400 Gbps. So, security encryptions need high throughput to meet data transfer rates and low latency to ensure the quality of services. In this paper, we propose a multi-core AES encryption hardware architecture to achieve ultra-high-throughput encryption. To reduce area cost and power consumption, these cores share the same KeyExpansion blocks. Fully parallel, outer round pipeline technique is also applied to the proposed architecture to achieve low latency encryption. The design has been modelled at RTL (Register-Transfer-Level) in VHDL and then synthesized with a CMOS 45nm technology using Synopsys Design Compiler. With 10-cores fully parallel and outer round pipeline, the implementation results show that our architecture achieves a throughput of 1 Tbps at the maximum operating frequency of 800 MHz. These results meet the speed requirements of future communication standards. In addition, our design also achieves a high power-efficiency of 2377 Gbps/W and area-efficiency of 833 Gbps/mm2, that is 2.6x and 4.5x higher than those of the other highest throughput of single-core AES, respectively.","PeriodicalId":416488,"journal":{"name":"VNU Journal of Science: Computer Science and Communication Engineering","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114604347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}