空中交通话音通信数据采集与标注自动处理管道

Engineering Proceedings Pub Date : 2021-12-31 DOI:10.3390/engproc2021013008

M. Kocour, Karel Veselý, Igor Szöke, Santosh Kesiraju, Juan Zuluaga-Gómez, Alexander Blatt, Amrutha Prasad, Iuliia Nigmatulina, P. Motlícek, D. Klakow, Allan Tart, H. Atassi, Pavel, Kolčárek, Honza Černocký, Claudia Cevenini, K. Choukri, M. Rigault, Fabian Landis, Saeed, Sarfjoo, Chloe Salamin

{"title":"空中交通话音通信数据采集与标注自动处理管道","authors":"M. Kocour, Karel Veselý, Igor Szöke, Santosh Kesiraju, Juan Zuluaga-Gómez, Alexander Blatt, Amrutha Prasad, Iuliia Nigmatulina, P. Motlícek, D. Klakow, Allan Tart, H. Atassi, Pavel, Kolčárek, Honza Černocký, Claudia Cevenini, K. Choukri, M. Rigault, Fabian Landis, Saeed, Sarfjoo, Chloe Salamin","doi":"10.3390/engproc2021013008","DOIUrl":null,"url":null,"abstract":"This document describes our pipeline for automatic processing of ATCO pilot audio communication we developed as part of the ATCO2 project. So far, we collected two thousand hours of audio recordings that we either preprocessed for the transcribers or used for semi-supervised training. Both methods of using the collected data can further improve our pipeline by retraining our models. The proposed automatic processing pipeline is a cascade of many standalone components: (a) segmentation, (b) volume control, (c) signal-to-noise ratio filtering, (d) diarization, (e) ‘speech-to-text’ (ASR) module, (f) English language detection, (g) call-sign code recognition, (h) ATCO—pilot classification and (i) highlighting commands and values. The key component of the pipeline is a speech-to-text transcription system that has to be trained with real-world ATC data; otherwise, the performance is poor. In order to further improve speech-to-text performance, we apply both semi-supervised training with our recordings and the contextual adaptation that uses a list of plausible callsigns from surveillance data as auxiliary information. Downstream NLP/NLU tasks are important from an application point of view. These application tasks need accurate models operating on top of the real speech-to-text output; thus, there is a need for more data too. Creating ATC data is the main aspiration of the ATCO2 project. At the end of the project, the data will be packaged and distributed by ELDA.","PeriodicalId":11748,"journal":{"name":"Engineering Proceedings","volume":"21 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Automatic Processing Pipeline for Collecting and Annotating Air-Traffic Voice Communication Data\",\"authors\":\"M. Kocour, Karel Veselý, Igor Szöke, Santosh Kesiraju, Juan Zuluaga-Gómez, Alexander Blatt, Amrutha Prasad, Iuliia Nigmatulina, P. Motlícek, D. Klakow, Allan Tart, H. Atassi, Pavel, Kolčárek, Honza Černocký, Claudia Cevenini, K. Choukri, M. Rigault, Fabian Landis, Saeed, Sarfjoo, Chloe Salamin\",\"doi\":\"10.3390/engproc2021013008\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This document describes our pipeline for automatic processing of ATCO pilot audio communication we developed as part of the ATCO2 project. So far, we collected two thousand hours of audio recordings that we either preprocessed for the transcribers or used for semi-supervised training. Both methods of using the collected data can further improve our pipeline by retraining our models. The proposed automatic processing pipeline is a cascade of many standalone components: (a) segmentation, (b) volume control, (c) signal-to-noise ratio filtering, (d) diarization, (e) ‘speech-to-text’ (ASR) module, (f) English language detection, (g) call-sign code recognition, (h) ATCO—pilot classification and (i) highlighting commands and values. The key component of the pipeline is a speech-to-text transcription system that has to be trained with real-world ATC data; otherwise, the performance is poor. In order to further improve speech-to-text performance, we apply both semi-supervised training with our recordings and the contextual adaptation that uses a list of plausible callsigns from surveillance data as auxiliary information. Downstream NLP/NLU tasks are important from an application point of view. These application tasks need accurate models operating on top of the real speech-to-text output; thus, there is a need for more data too. Creating ATC data is the main aspiration of the ATCO2 project. At the end of the project, the data will be packaged and distributed by ELDA.\",\"PeriodicalId\":11748,\"journal\":{\"name\":\"Engineering Proceedings\",\"volume\":\"21 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Engineering Proceedings\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3390/engproc2021013008\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Proceedings","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/engproc2021013008","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

摘要

本文档描述了我们作为ATCO2项目的一部分开发的ATCO先导音频通信的自动处理管道。到目前为止，我们收集了2000小时的录音，我们要么为转录员进行预处理，要么用于半监督训练。这两种使用收集数据的方法都可以通过重新训练我们的模型来进一步改进我们的管道。提出的自动处理管道是许多独立组件的级联:(a)分割，(b)音量控制，(c)信噪比滤波，(d)拨号，(e)“语音到文本”(ASR)模块，(f)英语语言检测，(g)呼号代码识别，(h) ATCO-pilot分类，(i)高亮命令和值。管道的关键组成部分是语音到文本的转录系统，必须与现实世界的ATC数据进行训练;否则，性能很差。为了进一步提高语音到文本的性能，我们对录音进行了半监督训练，并使用来自监视数据的似是而非的呼号列表作为辅助信息的上下文适应。从应用程序的角度来看，下游NLP/NLU任务很重要。这些应用任务需要在真实的语音到文本输出之上运行精确的模型;因此，也需要更多的数据。创建ATC数据是ATCO2项目的主要目标。在项目结束时，数据将由ELDA进行打包和分发。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Automatic Processing Pipeline for Collecting and Annotating Air-Traffic Voice Communication Data

This document describes our pipeline for automatic processing of ATCO pilot audio communication we developed as part of the ATCO2 project. So far, we collected two thousand hours of audio recordings that we either preprocessed for the transcribers or used for semi-supervised training. Both methods of using the collected data can further improve our pipeline by retraining our models. The proposed automatic processing pipeline is a cascade of many standalone components: (a) segmentation, (b) volume control, (c) signal-to-noise ratio filtering, (d) diarization, (e) ‘speech-to-text’ (ASR) module, (f) English language detection, (g) call-sign code recognition, (h) ATCO—pilot classification and (i) highlighting commands and values. The key component of the pipeline is a speech-to-text transcription system that has to be trained with real-world ATC data; otherwise, the performance is poor. In order to further improve speech-to-text performance, we apply both semi-supervised training with our recordings and the contextual adaptation that uses a list of plausible callsigns from surveillance data as auxiliary information. Downstream NLP/NLU tasks are important from an application point of view. These application tasks need accurate models operating on top of the real speech-to-text output; thus, there is a need for more data too. Creating ATC data is the main aspiration of the ATCO2 project. At the end of the project, the data will be packaged and distributed by ELDA.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Engineering Proceedings

CiteScore

0.70

自引率

0.00%

发文量