使用多任务注意网络提高未格式化文本的可读性

2021 13th International Conference on Knowledge and Systems Engineering (KSE) Pub Date : 2021-11-10 DOI:10.1109/KSE53942.2021.9648633

V. Phan, Minh-Tien Nguyen, L. Bui, Phong Dao Ngoc

{"title":"使用多任务注意网络提高未格式化文本的可读性","authors":"V. Phan, Minh-Tien Nguyen, L. Bui, Phong Dao Ngoc","doi":"10.1109/KSE53942.2021.9648633","DOIUrl":null,"url":null,"abstract":"Unformatted text is a big obstacle to human reading and degrades the performance of many downstream language understanding tasks. To improve the readability, this paper proposes a multitask deep neural model to restore format standards including punctuation and capitalization. Unlike prior research which usually solved a single task or many tasks separately, our model employs multitask learning to simultaneously perform the restoration tasks. The model consists of a backbone network to learn language features, and attention-based predictors for the two tasks. To find the efficient encoding method for unformatted text, we analyze the model behaviour with different backbone architectures such as convolutional neural networks (CNN), unidirectional and bidirectional recurrent-based networks. The model is validated on two Vietnamese datasets and integrated into an automatic speech recognition (ASR) system. The experiments show the promising results for both restoration tasks and the applicability of our model.","PeriodicalId":130986,"journal":{"name":"2021 13th International Conference on Knowledge and Systems Engineering (KSE)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Improving the Readability of Unformatted Text using Multitask Attention Networks\",\"authors\":\"V. Phan, Minh-Tien Nguyen, L. Bui, Phong Dao Ngoc\",\"doi\":\"10.1109/KSE53942.2021.9648633\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Unformatted text is a big obstacle to human reading and degrades the performance of many downstream language understanding tasks. To improve the readability, this paper proposes a multitask deep neural model to restore format standards including punctuation and capitalization. Unlike prior research which usually solved a single task or many tasks separately, our model employs multitask learning to simultaneously perform the restoration tasks. The model consists of a backbone network to learn language features, and attention-based predictors for the two tasks. To find the efficient encoding method for unformatted text, we analyze the model behaviour with different backbone architectures such as convolutional neural networks (CNN), unidirectional and bidirectional recurrent-based networks. The model is validated on two Vietnamese datasets and integrated into an automatic speech recognition (ASR) system. The experiments show the promising results for both restoration tasks and the applicability of our model.\",\"PeriodicalId\":130986,\"journal\":{\"name\":\"2021 13th International Conference on Knowledge and Systems Engineering (KSE)\",\"volume\":\"14 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 13th International Conference on Knowledge and Systems Engineering (KSE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/KSE53942.2021.9648633\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 13th International Conference on Knowledge and Systems Engineering (KSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/KSE53942.2021.9648633","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

未格式化的文本是人类阅读的一大障碍，降低了许多下游语言理解任务的性能。为了提高可读性，本文提出了一种多任务深度神经网络模型来恢复包括标点符号和大写字母在内的格式标准。与以往的研究不同，我们的模型采用多任务学习同时执行恢复任务。该模型由一个学习语言特征的骨干网络和两个任务的基于注意力的预测器组成。为了找到有效的非格式化文本编码方法，我们分析了卷积神经网络(CNN)、单向和双向递归网络等不同主干结构下的模型行为。该模型在两个越南语数据集上进行了验证，并集成到一个自动语音识别系统中。实验结果表明，该模型在恢复任务和适用性方面都取得了良好的效果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Improving the Readability of Unformatted Text using Multitask Attention Networks

Unformatted text is a big obstacle to human reading and degrades the performance of many downstream language understanding tasks. To improve the readability, this paper proposes a multitask deep neural model to restore format standards including punctuation and capitalization. Unlike prior research which usually solved a single task or many tasks separately, our model employs multitask learning to simultaneously perform the restoration tasks. The model consists of a backbone network to learn language features, and attention-based predictors for the two tasks. To find the efficient encoding method for unformatted text, we analyze the model behaviour with different backbone architectures such as convolutional neural networks (CNN), unidirectional and bidirectional recurrent-based networks. The model is validated on two Vietnamese datasets and integrated into an automatic speech recognition (ASR) system. The experiments show the promising results for both restoration tasks and the applicability of our model.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 13th International Conference on Knowledge and Systems Engineering (KSE)

自引率

0.00%

发文量