Improving the Readability of Unformatted Text using Multitask Attention Networks

2021 13th International Conference on Knowledge and Systems Engineering (KSE) Pub Date : 2021-11-10 DOI:10.1109/KSE53942.2021.9648633

V. Phan, Minh-Tien Nguyen, L. Bui, Phong Dao Ngoc

引用次数: 0

Abstract

Unformatted text is a big obstacle to human reading and degrades the performance of many downstream language understanding tasks. To improve the readability, this paper proposes a multitask deep neural model to restore format standards including punctuation and capitalization. Unlike prior research which usually solved a single task or many tasks separately, our model employs multitask learning to simultaneously perform the restoration tasks. The model consists of a backbone network to learn language features, and attention-based predictors for the two tasks. To find the efficient encoding method for unformatted text, we analyze the model behaviour with different backbone architectures such as convolutional neural networks (CNN), unidirectional and bidirectional recurrent-based networks. The model is validated on two Vietnamese datasets and integrated into an automatic speech recognition (ASR) system. The experiments show the promising results for both restoration tasks and the applicability of our model.

查看原文本刊更多论文

使用多任务注意网络提高未格式化文本的可读性

未格式化的文本是人类阅读的一大障碍，降低了许多下游语言理解任务的性能。为了提高可读性，本文提出了一种多任务深度神经网络模型来恢复包括标点符号和大写字母在内的格式标准。与以往的研究不同，我们的模型采用多任务学习同时执行恢复任务。该模型由一个学习语言特征的骨干网络和两个任务的基于注意力的预测器组成。为了找到有效的非格式化文本编码方法，我们分析了卷积神经网络(CNN)、单向和双向递归网络等不同主干结构下的模型行为。该模型在两个越南语数据集上进行了验证，并集成到一个自动语音识别系统中。实验结果表明，该模型在恢复任务和适用性方面都取得了良好的效果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 13th International Conference on Knowledge and Systems Engineering (KSE)

自引率

0.00%

发文量