{"title":"一种韵律启发的RNN方法用于机器生成的语音文本的标点符号,以提高人类的可读性","authors":"A. Moro, György Szaszák","doi":"10.1109/COGINFOCOM.2017.8268246","DOIUrl":null,"url":null,"abstract":"Speech communication human-machine interfaces exploit automatic speech recognition to implement speech-to-text conversion. Unfortunately, in the past, not much effort has been devoted to add punctuation marks to the recognized word chain after speech recognition. This affects human readability and makes interpretation hard. This paper presents an effort to restore punctuation marks by keeping low the latency resulting from this post-processing step. The approach exploits the prosodic structure and proposes a sequential modelling paradigm based on recurrent neural networks. Results show satisfying punctuation restoration abilities, especially taking into account that sentence boundaries are reliably detected. Even if the predicted punctuation sequence is not error free w.r.t. writing standards, human perception is expected to “repair” these errors more easily compared to the case when no punctuation is given at all and the reader is left in confusion regarding the basic segmentation of the word chain.","PeriodicalId":212559,"journal":{"name":"2017 8th IEEE International Conference on Cognitive Infocommunications (CogInfoCom)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"A prosody inspired RNN approach for punctuation of machine produced speech transcripts to improve human readability\",\"authors\":\"A. Moro, György Szaszák\",\"doi\":\"10.1109/COGINFOCOM.2017.8268246\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Speech communication human-machine interfaces exploit automatic speech recognition to implement speech-to-text conversion. Unfortunately, in the past, not much effort has been devoted to add punctuation marks to the recognized word chain after speech recognition. This affects human readability and makes interpretation hard. This paper presents an effort to restore punctuation marks by keeping low the latency resulting from this post-processing step. The approach exploits the prosodic structure and proposes a sequential modelling paradigm based on recurrent neural networks. Results show satisfying punctuation restoration abilities, especially taking into account that sentence boundaries are reliably detected. Even if the predicted punctuation sequence is not error free w.r.t. writing standards, human perception is expected to “repair” these errors more easily compared to the case when no punctuation is given at all and the reader is left in confusion regarding the basic segmentation of the word chain.\",\"PeriodicalId\":212559,\"journal\":{\"name\":\"2017 8th IEEE International Conference on Cognitive Infocommunications (CogInfoCom)\",\"volume\":\"47 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 8th IEEE International Conference on Cognitive Infocommunications (CogInfoCom)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/COGINFOCOM.2017.8268246\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 8th IEEE International Conference on Cognitive Infocommunications (CogInfoCom)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/COGINFOCOM.2017.8268246","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A prosody inspired RNN approach for punctuation of machine produced speech transcripts to improve human readability
Speech communication human-machine interfaces exploit automatic speech recognition to implement speech-to-text conversion. Unfortunately, in the past, not much effort has been devoted to add punctuation marks to the recognized word chain after speech recognition. This affects human readability and makes interpretation hard. This paper presents an effort to restore punctuation marks by keeping low the latency resulting from this post-processing step. The approach exploits the prosodic structure and proposes a sequential modelling paradigm based on recurrent neural networks. Results show satisfying punctuation restoration abilities, especially taking into account that sentence boundaries are reliably detected. Even if the predicted punctuation sequence is not error free w.r.t. writing standards, human perception is expected to “repair” these errors more easily compared to the case when no punctuation is given at all and the reader is left in confusion regarding the basic segmentation of the word chain.