建筑工地语音通信关键字识别框架

A. Mansoor, Shuai Liu, G. M. Ali, A. Bouferguene, M. Al-Hussein, Imran Hassan
{"title":"建筑工地语音通信关键字识别框架","authors":"A. Mansoor, Shuai Liu, G. M. Ali, A. Bouferguene, M. Al-Hussein, Imran Hassan","doi":"10.29173/mocs271","DOIUrl":null,"url":null,"abstract":"Worksite communication is a key to boosting teamwork and improving worker performance on the construction worksite. Communication among workers on the construction site mostly consists of speech communication. However, construction sites are typically noisy due to construction tasks like drilling and operation of heavy equipment. Meanwhile, workers on construction sites typically represent a range of different ethnic and linguistic backgrounds and have different speaking accents. This can make it difficult for the listener to understand the speaker clearly, leading to miscommunication and errors in decision making on the construction site. Technological advancements in recent years can be leveraged to mitigate this problem. In this paper, a keyword identification framework is developed for speech communication on the construction site. For this framework, 12 hours of raw audio data containing 18 crane signalman speech commands (referred to as “keywords”) are collected. The crane signalman uses specific keywords to communicate with the crane operator and guide the crane operator in the crane operations. The 2-second audio clips (this being the approximate duration of each keyword) are extracted from the raw audio dataset, and construction site noise is added. Moreover, mel-frequency cepstral coefficients are extracted from the waveform audio dataset. The extracted mel-frequency cepstral coefficients, in turn, are used to train the 1-dimensional convolutional neural network. After training, the model is found to achieve a training accuracy of 97.3%, a validation accuracy of 96.1%, and a testing accuracy of 93.8%. The model is further deployed for real-time identification of keywords in speech, with the model achieving an accuracy of 95.3%. In light of these findings, it can be concluded that the developed framework is suitable for real-time application in noisy construction sites for identifying specific keywords in speech.","PeriodicalId":422911,"journal":{"name":"Modular and Offsite Construction (MOC) Summit Proceedings","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Keyword identification framework for speech communication on construction sites\",\"authors\":\"A. Mansoor, Shuai Liu, G. M. Ali, A. Bouferguene, M. Al-Hussein, Imran Hassan\",\"doi\":\"10.29173/mocs271\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Worksite communication is a key to boosting teamwork and improving worker performance on the construction worksite. Communication among workers on the construction site mostly consists of speech communication. However, construction sites are typically noisy due to construction tasks like drilling and operation of heavy equipment. Meanwhile, workers on construction sites typically represent a range of different ethnic and linguistic backgrounds and have different speaking accents. This can make it difficult for the listener to understand the speaker clearly, leading to miscommunication and errors in decision making on the construction site. Technological advancements in recent years can be leveraged to mitigate this problem. In this paper, a keyword identification framework is developed for speech communication on the construction site. For this framework, 12 hours of raw audio data containing 18 crane signalman speech commands (referred to as “keywords”) are collected. The crane signalman uses specific keywords to communicate with the crane operator and guide the crane operator in the crane operations. The 2-second audio clips (this being the approximate duration of each keyword) are extracted from the raw audio dataset, and construction site noise is added. Moreover, mel-frequency cepstral coefficients are extracted from the waveform audio dataset. The extracted mel-frequency cepstral coefficients, in turn, are used to train the 1-dimensional convolutional neural network. After training, the model is found to achieve a training accuracy of 97.3%, a validation accuracy of 96.1%, and a testing accuracy of 93.8%. The model is further deployed for real-time identification of keywords in speech, with the model achieving an accuracy of 95.3%. In light of these findings, it can be concluded that the developed framework is suitable for real-time application in noisy construction sites for identifying specific keywords in speech.\",\"PeriodicalId\":422911,\"journal\":{\"name\":\"Modular and Offsite Construction (MOC) Summit Proceedings\",\"volume\":\"45 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Modular and Offsite Construction (MOC) Summit Proceedings\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.29173/mocs271\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Modular and Offsite Construction (MOC) Summit Proceedings","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.29173/mocs271","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

在施工现场,沟通是促进团队合作和提高工人绩效的关键。建筑工地工人之间的交流主要以言语交流为主。然而,由于钻井和重型设备的操作等施工任务,建筑工地通常会产生噪音。与此同时,建筑工地的工人通常代表着一系列不同的种族和语言背景,并且有不同的口音。这可能会使听者难以清楚地理解说话者,从而导致施工现场的沟通不畅和决策错误。近年来的技术进步可以用来缓解这个问题。本文针对建筑工地的语音通信,开发了一个关键字识别框架。在这个框架中,收集了12小时的原始音频数据,其中包含18个起重机信号员语音命令(称为“关键字”)。起重机信号员使用特定的关键词与起重机操作员进行通信,指导起重机操作员进行起重机操作。从原始音频数据集中提取2秒音频片段(这是每个关键字的大致持续时间),并添加建筑工地噪音。此外,从波形音频数据集中提取梅尔频倒谱系数。提取的梅尔频率倒谱系数,反过来,用于训练一维卷积神经网络。经过训练,发现该模型的训练准确率为97.3%,验证准确率为96.1%,测试准确率为93.8%。将该模型进一步应用于语音中关键词的实时识别,准确率达到95.3%。根据这些发现,可以得出结论,开发的框架适合于在嘈杂的建筑工地实时应用,用于识别语音中的特定关键词。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Keyword identification framework for speech communication on construction sites
Worksite communication is a key to boosting teamwork and improving worker performance on the construction worksite. Communication among workers on the construction site mostly consists of speech communication. However, construction sites are typically noisy due to construction tasks like drilling and operation of heavy equipment. Meanwhile, workers on construction sites typically represent a range of different ethnic and linguistic backgrounds and have different speaking accents. This can make it difficult for the listener to understand the speaker clearly, leading to miscommunication and errors in decision making on the construction site. Technological advancements in recent years can be leveraged to mitigate this problem. In this paper, a keyword identification framework is developed for speech communication on the construction site. For this framework, 12 hours of raw audio data containing 18 crane signalman speech commands (referred to as “keywords”) are collected. The crane signalman uses specific keywords to communicate with the crane operator and guide the crane operator in the crane operations. The 2-second audio clips (this being the approximate duration of each keyword) are extracted from the raw audio dataset, and construction site noise is added. Moreover, mel-frequency cepstral coefficients are extracted from the waveform audio dataset. The extracted mel-frequency cepstral coefficients, in turn, are used to train the 1-dimensional convolutional neural network. After training, the model is found to achieve a training accuracy of 97.3%, a validation accuracy of 96.1%, and a testing accuracy of 93.8%. The model is further deployed for real-time identification of keywords in speech, with the model achieving an accuracy of 95.3%. In light of these findings, it can be concluded that the developed framework is suitable for real-time application in noisy construction sites for identifying specific keywords in speech.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信