{"title":"Lightweight online punctuation and capitalization restoration for streaming ASR systems","authors":"Martin Polacek, Petr Cerva, Jindrich Zdansky","doi":"10.1016/j.specom.2025.103269","DOIUrl":null,"url":null,"abstract":"<div><div>This work proposes a lightweight online approach to automatic punctuation and capitalization restoration (APCR). Our method takes pure text as input and can be utilized in real-time speech transcription systems for, e.g., live captioning of TV or radio streams. We develop and evaluate it in a series of consecutive experiments, starting with the task of automatic punctuation restoration (APR). Within that, we also compare our results to another real-time APR method, which combines textual and acoustic features. The test data that we use for this purpose contains automatic transcripts of radio talks and TV debates. In the second part of the paper, we extend our method towards the task of automatic capitalization restoration (ACR). The resulting approach uses two consecutive ELECTRA-small models complemented by simple classification heads; the first ELECTRA model restores punctuation, while the second performs capitalization. Our complete system allows for restoring question marks, commas, periods, and capitalization with a very short inference time and a low latency of just four words. We evaluate its performance for Czech and German, and also compare its results to those of another existing APCR system for English. We are also publishing the data used for our evaluation and testing.</div></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"173 ","pages":"Article 103269"},"PeriodicalIF":2.4000,"publicationDate":"2025-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Speech Communication","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167639325000846","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ACOUSTICS","Score":null,"Total":0}
引用次数: 0
Abstract
This work proposes a lightweight online approach to automatic punctuation and capitalization restoration (APCR). Our method takes pure text as input and can be utilized in real-time speech transcription systems for, e.g., live captioning of TV or radio streams. We develop and evaluate it in a series of consecutive experiments, starting with the task of automatic punctuation restoration (APR). Within that, we also compare our results to another real-time APR method, which combines textual and acoustic features. The test data that we use for this purpose contains automatic transcripts of radio talks and TV debates. In the second part of the paper, we extend our method towards the task of automatic capitalization restoration (ACR). The resulting approach uses two consecutive ELECTRA-small models complemented by simple classification heads; the first ELECTRA model restores punctuation, while the second performs capitalization. Our complete system allows for restoring question marks, commas, periods, and capitalization with a very short inference time and a low latency of just four words. We evaluate its performance for Czech and German, and also compare its results to those of another existing APCR system for English. We are also publishing the data used for our evaluation and testing.
期刊介绍:
Speech Communication is an interdisciplinary journal whose primary objective is to fulfil the need for the rapid dissemination and thorough discussion of basic and applied research results.
The journal''s primary objectives are:
• to present a forum for the advancement of human and human-machine speech communication science;
• to stimulate cross-fertilization between different fields of this domain;
• to contribute towards the rapid and wide diffusion of scientifically sound contributions in this domain.