Transfer Learning for Punctuation Prediction

2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2019-11-01 DOI:10.1109/APSIPAASC47483.2019.9023200

Karan Makhija, Thi-Nga Ho, Chng Eng Siong

引用次数: 29

Abstract

The output from most of the Automatic Speech Recognition system is a continuous sequence of words without proper punctuation. This decreases human readability and the performance of downstream natural language processing tasks on ASR text. We treat the punctuation prediction task as a sequence tagging task and propose an architecture that uses pre-trained BERT embeddings. Our model significantly improves the state of art on the IWSLT dataset. We achieve an overall F1 of 81.4% on the joint prediction of period, comma and question mark.

查看原文本刊更多论文

标点符号预测的迁移学习

大多数自动语音识别系统的输出是没有适当标点符号的连续单词序列。这降低了人类对ASR文本的可读性和下游自然语言处理任务的性能。我们将标点符号预测任务视为序列标记任务，并提出了一种使用预训练BERT嵌入的架构。我们的模型显著提高了IWSLT数据集的技术水平。我们在句号、逗号和问号的联合预测上实现了81.4%的整体F1。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)

自引率

0.00%

发文量