利用合成语音完善海事自动语音识别功能

IF 3.9 Q2 TRANSPORTATION
Christoph Martius, Emin Çağatay Nakilcioğlu, Maximilian Reimann, Ole John
{"title":"利用合成语音完善海事自动语音识别功能","authors":"Christoph Martius,&nbsp;Emin Çağatay Nakilcioğlu,&nbsp;Maximilian Reimann,&nbsp;Ole John","doi":"10.1016/j.martra.2024.100114","DOIUrl":null,"url":null,"abstract":"<div><p>Maritime transport serves as a critical component of global trade and logistics, enabling the movement of goods and resources across oceans and waterways. Especially in busy waterways and ports, effective and accurate communication is essential, as it ensures the seamless exchange of information and the coordinated execution of port activities. However, comprehensibility is often hindered by factors such as poor audio quality, background noise, and diverse languages and accents. Automatic Speech Recognition (ASR) systems can mitigate these issues by providing real-time transcription and enabling the implementation of automated, value-adding services to enhance situational awareness. While pre-trained ASR models excel on general speech, maritime ASR faces unique challenges due to a lack of annotated data, diverse accents, and specialized terminology.</p><p>To this end, we focus on improving the transcription quality of pre-trained ASR models for maritime communication with a particular focus on accurately recognizing maritime-specific terminology such as vessel and location names. Due to the scarcity of transcribed maritime communication, we create a synthetic training dataset tailored to regional maritime terminology. The synthetic audio is augmented with general human speech and used to fine-tune an end-to-end ASR model under various settings. The evaluation of the models employs a proprietary dataset of regional maritime radio communication from the port of Hamburg.</p><p>The experimental results demonstrate a notable enhancement in ASR performance. Specifically, our approach yields an absolute improvement over the pre-trained baseline of 13.46% Word-Error-Rate and an increase of 41.57% recall for vessel names and 38.65% recall for locations. Our findings underscore the efficacy of integrating synthetic training data to address the challenges encountered in maritime ASR, paving the way for more robust and accurate speech recognition systems tailored to maritime applications.</p></div>","PeriodicalId":100885,"journal":{"name":"Maritime Transport Research","volume":"7 ","pages":"Article 100114"},"PeriodicalIF":3.9000,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666822X24000121/pdfft?md5=5623183d16dfc56ba1588e2a78256df6&pid=1-s2.0-S2666822X24000121-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Refining maritime Automatic Speech Recognition by leveraging synthetic speech\",\"authors\":\"Christoph Martius,&nbsp;Emin Çağatay Nakilcioğlu,&nbsp;Maximilian Reimann,&nbsp;Ole John\",\"doi\":\"10.1016/j.martra.2024.100114\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Maritime transport serves as a critical component of global trade and logistics, enabling the movement of goods and resources across oceans and waterways. Especially in busy waterways and ports, effective and accurate communication is essential, as it ensures the seamless exchange of information and the coordinated execution of port activities. However, comprehensibility is often hindered by factors such as poor audio quality, background noise, and diverse languages and accents. Automatic Speech Recognition (ASR) systems can mitigate these issues by providing real-time transcription and enabling the implementation of automated, value-adding services to enhance situational awareness. While pre-trained ASR models excel on general speech, maritime ASR faces unique challenges due to a lack of annotated data, diverse accents, and specialized terminology.</p><p>To this end, we focus on improving the transcription quality of pre-trained ASR models for maritime communication with a particular focus on accurately recognizing maritime-specific terminology such as vessel and location names. Due to the scarcity of transcribed maritime communication, we create a synthetic training dataset tailored to regional maritime terminology. The synthetic audio is augmented with general human speech and used to fine-tune an end-to-end ASR model under various settings. The evaluation of the models employs a proprietary dataset of regional maritime radio communication from the port of Hamburg.</p><p>The experimental results demonstrate a notable enhancement in ASR performance. Specifically, our approach yields an absolute improvement over the pre-trained baseline of 13.46% Word-Error-Rate and an increase of 41.57% recall for vessel names and 38.65% recall for locations. Our findings underscore the efficacy of integrating synthetic training data to address the challenges encountered in maritime ASR, paving the way for more robust and accurate speech recognition systems tailored to maritime applications.</p></div>\",\"PeriodicalId\":100885,\"journal\":{\"name\":\"Maritime Transport Research\",\"volume\":\"7 \",\"pages\":\"Article 100114\"},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2024-08-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2666822X24000121/pdfft?md5=5623183d16dfc56ba1588e2a78256df6&pid=1-s2.0-S2666822X24000121-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Maritime Transport Research\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666822X24000121\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"TRANSPORTATION\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Maritime Transport Research","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666822X24000121","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"TRANSPORTATION","Score":null,"Total":0}
引用次数: 0

摘要

海上运输是全球贸易和物流的重要组成部分,使货物和资源得以在海洋和水道上流动。特别是在繁忙的水道和港口,有效和准确的通信至关重要,因为它能确保信息的无缝交流和港口活动的协调执行。然而,音频质量差、背景噪音以及语言和口音的多样性等因素往往会影响语音的可理解性。自动语音识别(ASR)系统可以通过提供实时转录来缓解这些问题,并实现自动化的增值服务,以提高对态势的感知能力。虽然预训练的 ASR 模型在一般语音方面表现出色,但由于缺乏注释数据、不同口音和专业术语,海事 ASR 面临着独特的挑战。由于转录的海事通信很少,我们创建了一个针对区域海事术语的合成训练数据集。合成音频使用普通人类语音进行增强,并用于在各种设置下对端到端 ASR 模型进行微调。实验结果表明,ASR 性能显著提高。具体来说,我们的方法比预先训练的基线方法绝对提高了 13.46% 的词错误率,船舶名称的召回率提高了 41.57%,地点的召回率提高了 38.65%。我们的研究结果强调了整合合成训练数据以应对海事 ASR 中遇到的挑战的有效性,为开发更强大、更准确的海事应用语音识别系统铺平了道路。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Refining maritime Automatic Speech Recognition by leveraging synthetic speech

Maritime transport serves as a critical component of global trade and logistics, enabling the movement of goods and resources across oceans and waterways. Especially in busy waterways and ports, effective and accurate communication is essential, as it ensures the seamless exchange of information and the coordinated execution of port activities. However, comprehensibility is often hindered by factors such as poor audio quality, background noise, and diverse languages and accents. Automatic Speech Recognition (ASR) systems can mitigate these issues by providing real-time transcription and enabling the implementation of automated, value-adding services to enhance situational awareness. While pre-trained ASR models excel on general speech, maritime ASR faces unique challenges due to a lack of annotated data, diverse accents, and specialized terminology.

To this end, we focus on improving the transcription quality of pre-trained ASR models for maritime communication with a particular focus on accurately recognizing maritime-specific terminology such as vessel and location names. Due to the scarcity of transcribed maritime communication, we create a synthetic training dataset tailored to regional maritime terminology. The synthetic audio is augmented with general human speech and used to fine-tune an end-to-end ASR model under various settings. The evaluation of the models employs a proprietary dataset of regional maritime radio communication from the port of Hamburg.

The experimental results demonstrate a notable enhancement in ASR performance. Specifically, our approach yields an absolute improvement over the pre-trained baseline of 13.46% Word-Error-Rate and an increase of 41.57% recall for vessel names and 38.65% recall for locations. Our findings underscore the efficacy of integrating synthetic training data to address the challenges encountered in maritime ASR, paving the way for more robust and accurate speech recognition systems tailored to maritime applications.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
5.90
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信