APyCA: Towards the automatic subtitling of television content in Spanish

Aitor Álvarez, A. D. Pozo, Andoni Arruti
{"title":"APyCA: Towards the automatic subtitling of television content in Spanish","authors":"Aitor Álvarez, A. D. Pozo, Andoni Arruti","doi":"10.1109/IMCSIT.2010.5680055","DOIUrl":null,"url":null,"abstract":"Automatic subtitling of television content has become an approachable challenge due to the advancement of the technology involved. In addition, it has also become a priority need for many Spanish TV broadcasters, who will have to broadcast up to 90% of subtitled content by 2013 to comply with recently approved national audiovisual policies. APyCA, the prototype system described in this paper, has been developed in an attempt to automate the process of subtitling television content in Spanish through the application of state-of-the-art speech and language technologies. Voice activity detection, automatic speech recognition and alignment, discourse segment detection and speaker diarization have proved to be useful to generate time-coded colour-assigned draft transcriptions for post-editing. The productive benefit of the followed approach heavily depends on the performance of the speech recognition module, which achieves reasonable results on clean read speech but degrades as this becomes more noisy and/or spontaneous.","PeriodicalId":147803,"journal":{"name":"Proceedings of the International Multiconference on Computer Science and Information Technology","volume":"77 4 Pt 1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the International Multiconference on Computer Science and Information Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IMCSIT.2010.5680055","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12

Abstract

Automatic subtitling of television content has become an approachable challenge due to the advancement of the technology involved. In addition, it has also become a priority need for many Spanish TV broadcasters, who will have to broadcast up to 90% of subtitled content by 2013 to comply with recently approved national audiovisual policies. APyCA, the prototype system described in this paper, has been developed in an attempt to automate the process of subtitling television content in Spanish through the application of state-of-the-art speech and language technologies. Voice activity detection, automatic speech recognition and alignment, discourse segment detection and speaker diarization have proved to be useful to generate time-coded colour-assigned draft transcriptions for post-editing. The productive benefit of the followed approach heavily depends on the performance of the speech recognition module, which achieves reasonable results on clean read speech but degrades as this becomes more noisy and/or spontaneous.
APyCA:西班牙语电视内容的自动字幕
由于技术的进步,电视内容的自动字幕已经成为一项触手可及的挑战。此外,它也成为许多西班牙电视广播公司的优先需求,到2013年,他们必须播放高达90%的字幕内容,以遵守最近批准的国家视听政策。本文中描述的原型系统APyCA是为了通过应用最先进的语音和语言技术来实现西班牙语电视内容字幕的自动化过程而开发的。语音活动检测,自动语音识别和对齐,话语片段检测和说话人拨号已被证明是有用的,以生成时间编码的颜色分配的草稿转录后编辑。以下方法的生产效益在很大程度上取决于语音识别模块的性能,它在干净的读语音上取得了合理的结果,但随着它变得更嘈杂和/或自发而降低。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信