{"title":"使用多任务注意网络提高未格式化文本的可读性","authors":"V. Phan, Minh-Tien Nguyen, L. Bui, Phong Dao Ngoc","doi":"10.1109/KSE53942.2021.9648633","DOIUrl":null,"url":null,"abstract":"Unformatted text is a big obstacle to human reading and degrades the performance of many downstream language understanding tasks. To improve the readability, this paper proposes a multitask deep neural model to restore format standards including punctuation and capitalization. Unlike prior research which usually solved a single task or many tasks separately, our model employs multitask learning to simultaneously perform the restoration tasks. The model consists of a backbone network to learn language features, and attention-based predictors for the two tasks. To find the efficient encoding method for unformatted text, we analyze the model behaviour with different backbone architectures such as convolutional neural networks (CNN), unidirectional and bidirectional recurrent-based networks. The model is validated on two Vietnamese datasets and integrated into an automatic speech recognition (ASR) system. The experiments show the promising results for both restoration tasks and the applicability of our model.","PeriodicalId":130986,"journal":{"name":"2021 13th International Conference on Knowledge and Systems Engineering (KSE)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Improving the Readability of Unformatted Text using Multitask Attention Networks\",\"authors\":\"V. Phan, Minh-Tien Nguyen, L. Bui, Phong Dao Ngoc\",\"doi\":\"10.1109/KSE53942.2021.9648633\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Unformatted text is a big obstacle to human reading and degrades the performance of many downstream language understanding tasks. To improve the readability, this paper proposes a multitask deep neural model to restore format standards including punctuation and capitalization. Unlike prior research which usually solved a single task or many tasks separately, our model employs multitask learning to simultaneously perform the restoration tasks. The model consists of a backbone network to learn language features, and attention-based predictors for the two tasks. To find the efficient encoding method for unformatted text, we analyze the model behaviour with different backbone architectures such as convolutional neural networks (CNN), unidirectional and bidirectional recurrent-based networks. The model is validated on two Vietnamese datasets and integrated into an automatic speech recognition (ASR) system. The experiments show the promising results for both restoration tasks and the applicability of our model.\",\"PeriodicalId\":130986,\"journal\":{\"name\":\"2021 13th International Conference on Knowledge and Systems Engineering (KSE)\",\"volume\":\"14 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 13th International Conference on Knowledge and Systems Engineering (KSE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/KSE53942.2021.9648633\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 13th International Conference on Knowledge and Systems Engineering (KSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/KSE53942.2021.9648633","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Improving the Readability of Unformatted Text using Multitask Attention Networks
Unformatted text is a big obstacle to human reading and degrades the performance of many downstream language understanding tasks. To improve the readability, this paper proposes a multitask deep neural model to restore format standards including punctuation and capitalization. Unlike prior research which usually solved a single task or many tasks separately, our model employs multitask learning to simultaneously perform the restoration tasks. The model consists of a backbone network to learn language features, and attention-based predictors for the two tasks. To find the efficient encoding method for unformatted text, we analyze the model behaviour with different backbone architectures such as convolutional neural networks (CNN), unidirectional and bidirectional recurrent-based networks. The model is validated on two Vietnamese datasets and integrated into an automatic speech recognition (ASR) system. The experiments show the promising results for both restoration tasks and the applicability of our model.