Automatic Processing Pipeline for Collecting and Annotating Air-Traffic Voice Communication Data

M. Kocour, Karel Veselý, Igor Szöke, Santosh Kesiraju, Juan Zuluaga-Gómez, Alexander Blatt, Amrutha Prasad, Iuliia Nigmatulina, P. Motlícek, D. Klakow, Allan Tart, H. Atassi, Pavel, Kolčárek, Honza Černocký, Claudia Cevenini, K. Choukri, M. Rigault, Fabian Landis, Saeed, Sarfjoo, Chloe Salamin
{"title":"Automatic Processing Pipeline for Collecting and Annotating Air-Traffic Voice Communication Data","authors":"M. Kocour, Karel Veselý, Igor Szöke, Santosh Kesiraju, Juan Zuluaga-Gómez, Alexander Blatt, Amrutha Prasad, Iuliia Nigmatulina, P. Motlícek, D. Klakow, Allan Tart, H. Atassi, Pavel, Kolčárek, Honza Černocký, Claudia Cevenini, K. Choukri, M. Rigault, Fabian Landis, Saeed, Sarfjoo, Chloe Salamin","doi":"10.3390/engproc2021013008","DOIUrl":null,"url":null,"abstract":"This document describes our pipeline for automatic processing of ATCO pilot audio communication we developed as part of the ATCO2 project. So far, we collected two thousand hours of audio recordings that we either preprocessed for the transcribers or used for semi-supervised training. Both methods of using the collected data can further improve our pipeline by retraining our models. The proposed automatic processing pipeline is a cascade of many standalone components: (a) segmentation, (b) volume control, (c) signal-to-noise ratio filtering, (d) diarization, (e) ‘speech-to-text’ (ASR) module, (f) English language detection, (g) call-sign code recognition, (h) ATCO—pilot classification and (i) highlighting commands and values. The key component of the pipeline is a speech-to-text transcription system that has to be trained with real-world ATC data; otherwise, the performance is poor. In order to further improve speech-to-text performance, we apply both semi-supervised training with our recordings and the contextual adaptation that uses a list of plausible callsigns from surveillance data as auxiliary information. Downstream NLP/NLU tasks are important from an application point of view. These application tasks need accurate models operating on top of the real speech-to-text output; thus, there is a need for more data too. Creating ATC data is the main aspiration of the ATCO2 project. At the end of the project, the data will be packaged and distributed by ELDA.","PeriodicalId":11748,"journal":{"name":"Engineering Proceedings","volume":"21 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Proceedings","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/engproc2021013008","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9

Abstract

This document describes our pipeline for automatic processing of ATCO pilot audio communication we developed as part of the ATCO2 project. So far, we collected two thousand hours of audio recordings that we either preprocessed for the transcribers or used for semi-supervised training. Both methods of using the collected data can further improve our pipeline by retraining our models. The proposed automatic processing pipeline is a cascade of many standalone components: (a) segmentation, (b) volume control, (c) signal-to-noise ratio filtering, (d) diarization, (e) ‘speech-to-text’ (ASR) module, (f) English language detection, (g) call-sign code recognition, (h) ATCO—pilot classification and (i) highlighting commands and values. The key component of the pipeline is a speech-to-text transcription system that has to be trained with real-world ATC data; otherwise, the performance is poor. In order to further improve speech-to-text performance, we apply both semi-supervised training with our recordings and the contextual adaptation that uses a list of plausible callsigns from surveillance data as auxiliary information. Downstream NLP/NLU tasks are important from an application point of view. These application tasks need accurate models operating on top of the real speech-to-text output; thus, there is a need for more data too. Creating ATC data is the main aspiration of the ATCO2 project. At the end of the project, the data will be packaged and distributed by ELDA.
空中交通话音通信数据采集与标注自动处理管道
本文档描述了我们作为ATCO2项目的一部分开发的ATCO先导音频通信的自动处理管道。到目前为止,我们收集了2000小时的录音,我们要么为转录员进行预处理,要么用于半监督训练。这两种使用收集数据的方法都可以通过重新训练我们的模型来进一步改进我们的管道。提出的自动处理管道是许多独立组件的级联:(a)分割,(b)音量控制,(c)信噪比滤波,(d)拨号,(e)“语音到文本”(ASR)模块,(f)英语语言检测,(g)呼号代码识别,(h) ATCO-pilot分类,(i)高亮命令和值。管道的关键组成部分是语音到文本的转录系统,必须与现实世界的ATC数据进行训练;否则,性能很差。为了进一步提高语音到文本的性能,我们对录音进行了半监督训练,并使用来自监视数据的似是而非的呼号列表作为辅助信息的上下文适应。从应用程序的角度来看,下游NLP/NLU任务很重要。这些应用任务需要在真实的语音到文本输出之上运行精确的模型;因此,也需要更多的数据。创建ATC数据是ATCO2项目的主要目标。在项目结束时,数据将由ELDA进行打包和分发。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
0.70
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信