Improving the Readability of Unformatted Text using Multitask Attention Networks

V. Phan, Minh-Tien Nguyen, L. Bui, Phong Dao Ngoc
{"title":"Improving the Readability of Unformatted Text using Multitask Attention Networks","authors":"V. Phan, Minh-Tien Nguyen, L. Bui, Phong Dao Ngoc","doi":"10.1109/KSE53942.2021.9648633","DOIUrl":null,"url":null,"abstract":"Unformatted text is a big obstacle to human reading and degrades the performance of many downstream language understanding tasks. To improve the readability, this paper proposes a multitask deep neural model to restore format standards including punctuation and capitalization. Unlike prior research which usually solved a single task or many tasks separately, our model employs multitask learning to simultaneously perform the restoration tasks. The model consists of a backbone network to learn language features, and attention-based predictors for the two tasks. To find the efficient encoding method for unformatted text, we analyze the model behaviour with different backbone architectures such as convolutional neural networks (CNN), unidirectional and bidirectional recurrent-based networks. The model is validated on two Vietnamese datasets and integrated into an automatic speech recognition (ASR) system. The experiments show the promising results for both restoration tasks and the applicability of our model.","PeriodicalId":130986,"journal":{"name":"2021 13th International Conference on Knowledge and Systems Engineering (KSE)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 13th International Conference on Knowledge and Systems Engineering (KSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/KSE53942.2021.9648633","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Unformatted text is a big obstacle to human reading and degrades the performance of many downstream language understanding tasks. To improve the readability, this paper proposes a multitask deep neural model to restore format standards including punctuation and capitalization. Unlike prior research which usually solved a single task or many tasks separately, our model employs multitask learning to simultaneously perform the restoration tasks. The model consists of a backbone network to learn language features, and attention-based predictors for the two tasks. To find the efficient encoding method for unformatted text, we analyze the model behaviour with different backbone architectures such as convolutional neural networks (CNN), unidirectional and bidirectional recurrent-based networks. The model is validated on two Vietnamese datasets and integrated into an automatic speech recognition (ASR) system. The experiments show the promising results for both restoration tasks and the applicability of our model.
使用多任务注意网络提高未格式化文本的可读性
未格式化的文本是人类阅读的一大障碍,降低了许多下游语言理解任务的性能。为了提高可读性,本文提出了一种多任务深度神经网络模型来恢复包括标点符号和大写字母在内的格式标准。与以往的研究不同,我们的模型采用多任务学习同时执行恢复任务。该模型由一个学习语言特征的骨干网络和两个任务的基于注意力的预测器组成。为了找到有效的非格式化文本编码方法,我们分析了卷积神经网络(CNN)、单向和双向递归网络等不同主干结构下的模型行为。该模型在两个越南语数据集上进行了验证,并集成到一个自动语音识别系统中。实验结果表明,该模型在恢复任务和适用性方面都取得了良好的效果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信