Optimized Deep Neural Networks Audio Tagging Framework for Virtual Business Assistant

Pub Date : 2023-01-01 DOI:10.12720/jait.14.3.550-558
Fatma Sh. El-metwally, Ali I. Eldesouky, Nahla B. Abdel-Hamid, Sally M. Elghamrawy
{"title":"Optimized Deep Neural Networks Audio Tagging Framework for Virtual Business Assistant","authors":"Fatma Sh. El-metwally, Ali I. Eldesouky, Nahla B. Abdel-Hamid, Sally M. Elghamrawy","doi":"10.12720/jait.14.3.550-558","DOIUrl":null,"url":null,"abstract":"— A virtual assistant has a huge impact on business and an organizations development. It can be used to manage customer relations and deal with received queries, automatically reply to e-mails and phone calls.Audio signal processing has become increasingly popular since the development of virtual assistants. Deep learning and audio signal processing advancements have dramatically enhanced audio tagging. Audio Tagging (AT) is a challenge that requires eliciting descriptive labels from audio clips. This study proposes an Optimized Deep Neural Networks Audio Tagging Framework for Virtual Business Assistant to categorize and analyze audio tagging. Each input signal is used to extract the various audio tagging features. The extracted features are input into a neural network to carry out a multi-label classification for the predicted tags. Optimization techniques are used to improve the quality of the model fit for neural networks. To test the efficiency of the framework, four comparison experiments have been conducted between it and some of the others. From these results, it was concluded that this framework is better than the others in terms of efficiency. When the neural network was trained, Mel-Frequency Cepstral Coefficient (MFCC) features with Adamax achieved the best results with 93% accuracy and a 0.17% loss. When evaluating the performance of the model for seven labels, it achieved an average of precision 0.952, recall 0.952, F-score 0.951, accuracy 0.983, and an equal error rate of 0.015 in the evaluation set compared to the provided Detection and Classification of Acoustic Scenes and Events (DSCASE) baseline where he achieved and accuracy of 72.5% and","PeriodicalId":0,"journal":{"name":"","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.12720/jait.14.3.550-558","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

— A virtual assistant has a huge impact on business and an organizations development. It can be used to manage customer relations and deal with received queries, automatically reply to e-mails and phone calls.Audio signal processing has become increasingly popular since the development of virtual assistants. Deep learning and audio signal processing advancements have dramatically enhanced audio tagging. Audio Tagging (AT) is a challenge that requires eliciting descriptive labels from audio clips. This study proposes an Optimized Deep Neural Networks Audio Tagging Framework for Virtual Business Assistant to categorize and analyze audio tagging. Each input signal is used to extract the various audio tagging features. The extracted features are input into a neural network to carry out a multi-label classification for the predicted tags. Optimization techniques are used to improve the quality of the model fit for neural networks. To test the efficiency of the framework, four comparison experiments have been conducted between it and some of the others. From these results, it was concluded that this framework is better than the others in terms of efficiency. When the neural network was trained, Mel-Frequency Cepstral Coefficient (MFCC) features with Adamax achieved the best results with 93% accuracy and a 0.17% loss. When evaluating the performance of the model for seven labels, it achieved an average of precision 0.952, recall 0.952, F-score 0.951, accuracy 0.983, and an equal error rate of 0.015 in the evaluation set compared to the provided Detection and Classification of Acoustic Scenes and Events (DSCASE) baseline where he achieved and accuracy of 72.5% and
分享
查看原文
优化的深度神经网络音频标记框架的虚拟商务助理
-虚拟助理对业务和组织的发展有着巨大的影响。它可以用来管理客户关系,处理收到的查询,自动回复电子邮件和电话。随着虚拟助手的发展,音频信号处理变得越来越流行。深度学习和音频信号处理的进步极大地增强了音频标记。音频标记(AT)是一项挑战,需要从音频片段中提取描述性标签。本研究提出一种优化的深度神经网络音频标注框架,用于虚拟商务助理对音频标注进行分类和分析。每个输入信号用于提取各种音频标记特征。将提取的特征输入到神经网络中,对预测的标签进行多标签分类。优化技术用于提高神经网络的模型拟合质量。为了验证该框架的有效性,我们将其与其他框架进行了四次对比实验。从这些结果中得出结论,该框架在效率方面优于其他框架。在训练神经网络时,使用Adamax的Mel-Frequency Cepstral Coefficient (MFCC)特征获得了最佳效果,准确率为93%,损失为0.17%。在评估七个标签的模型性能时,与提供的声学场景和事件检测和分类(DSCASE)基线相比,该模型在评估集中的平均精度为0.952,召回率0.952,f分数0.951,准确度0.983,错误率为0.015,其中他实现了72.5%和
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信