使用 CRNN 和混合特征,基于扬声器识别 Ethio-Semitic 语言。

IF 1.1 3区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Malefia Demilie Melese, Amlakie Aschale Alemu, Ayodeji Olalekan Salau, Ibrahim Gashaw Kasa
{"title":"使用 CRNN 和混合特征,基于扬声器识别 Ethio-Semitic 语言。","authors":"Malefia Demilie Melese, Amlakie Aschale Alemu, Ayodeji Olalekan Salau, Ibrahim Gashaw Kasa","doi":"10.1080/0954898X.2024.2359610","DOIUrl":null,"url":null,"abstract":"<p><p>Natural language is frequently employed for information exchange between humans and computers in modern digital environments. Natural Language Processing (NLP) is a basic requirement for technological advancement in the field of speech recognition. For additional NLP activities like speech-to-text translation, speech-to-speech translation, speaker recognition, and speech information retrieval, language identification (LID) is a prerequisite. In this paper, we developed a Language Identification (LID) model for Ethio-Semitic languages. We used a hybrid approach (a convolutional recurrent neural network (CRNN)), in addition to a mixed (Mel frequency cepstral coefficient (MFCC) and mel-spectrogram) approach, to build our LID model. The study focused on four Ethio-Semitic languages: Amharic, Ge'ez, Guragigna, and Tigrinya. By using data augmentation for the selected languages, we were able to expand our original dataset of 8 h of audio data to 24 h and 40 min. The proposed selected features, when evaluated, achieved an average performance accuracy of 98.1%, 98.6%, and 99.9% for testing, validation, and training, respectively. The results show that the CRNN model with (Mel-Spectrogram + MFCC) combination feature achieved the best results when compared to other existing models.</p>","PeriodicalId":54735,"journal":{"name":"Network-Computation in Neural Systems","volume":null,"pages":null},"PeriodicalIF":1.1000,"publicationDate":"2024-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Speaker-based language identification for Ethio-Semitic languages using CRNN and hybrid features.\",\"authors\":\"Malefia Demilie Melese, Amlakie Aschale Alemu, Ayodeji Olalekan Salau, Ibrahim Gashaw Kasa\",\"doi\":\"10.1080/0954898X.2024.2359610\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Natural language is frequently employed for information exchange between humans and computers in modern digital environments. Natural Language Processing (NLP) is a basic requirement for technological advancement in the field of speech recognition. For additional NLP activities like speech-to-text translation, speech-to-speech translation, speaker recognition, and speech information retrieval, language identification (LID) is a prerequisite. In this paper, we developed a Language Identification (LID) model for Ethio-Semitic languages. We used a hybrid approach (a convolutional recurrent neural network (CRNN)), in addition to a mixed (Mel frequency cepstral coefficient (MFCC) and mel-spectrogram) approach, to build our LID model. The study focused on four Ethio-Semitic languages: Amharic, Ge'ez, Guragigna, and Tigrinya. By using data augmentation for the selected languages, we were able to expand our original dataset of 8 h of audio data to 24 h and 40 min. The proposed selected features, when evaluated, achieved an average performance accuracy of 98.1%, 98.6%, and 99.9% for testing, validation, and training, respectively. The results show that the CRNN model with (Mel-Spectrogram + MFCC) combination feature achieved the best results when compared to other existing models.</p>\",\"PeriodicalId\":54735,\"journal\":{\"name\":\"Network-Computation in Neural Systems\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.1000,\"publicationDate\":\"2024-06-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Network-Computation in Neural Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1080/0954898X.2024.2359610\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Network-Computation in Neural Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1080/0954898X.2024.2359610","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

在现代数字环境中,人与计算机之间经常使用自然语言进行信息交流。自然语言处理(NLP)是语音识别领域技术进步的基本要求。对于语音到文本翻译、语音到语音翻译、说话人识别和语音信息检索等其他 NLP 活动,语言识别(LID)是先决条件。在本文中,我们为 Ethio-Semitic 语言开发了一个语言识别 (LID) 模型。我们采用了一种混合方法(卷积递归神经网络(CRNN))以及一种混合方法(梅尔频率倒频谱系数(MFCC)和梅尔频谱图)来建立 LID 模型。研究重点是四种民族-闪米特语言:阿姆哈拉语、盖伊兹语、古拉格尼亚语和提格雷尼亚语。通过对所选语言进行数据扩充,我们将原来 8 小时的音频数据集扩充到了 24 小时 40 分钟。在对所选特征进行评估时,建议的测试、验证和训练平均准确率分别达到 98.1%、98.6% 和 99.9%。结果表明,与其他现有模型相比,具有(Mel-Spectrogram + MFCC)组合特征的 CRNN 模型取得了最佳结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Speaker-based language identification for Ethio-Semitic languages using CRNN and hybrid features.

Natural language is frequently employed for information exchange between humans and computers in modern digital environments. Natural Language Processing (NLP) is a basic requirement for technological advancement in the field of speech recognition. For additional NLP activities like speech-to-text translation, speech-to-speech translation, speaker recognition, and speech information retrieval, language identification (LID) is a prerequisite. In this paper, we developed a Language Identification (LID) model for Ethio-Semitic languages. We used a hybrid approach (a convolutional recurrent neural network (CRNN)), in addition to a mixed (Mel frequency cepstral coefficient (MFCC) and mel-spectrogram) approach, to build our LID model. The study focused on four Ethio-Semitic languages: Amharic, Ge'ez, Guragigna, and Tigrinya. By using data augmentation for the selected languages, we were able to expand our original dataset of 8 h of audio data to 24 h and 40 min. The proposed selected features, when evaluated, achieved an average performance accuracy of 98.1%, 98.6%, and 99.9% for testing, validation, and training, respectively. The results show that the CRNN model with (Mel-Spectrogram + MFCC) combination feature achieved the best results when compared to other existing models.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Network-Computation in Neural Systems
Network-Computation in Neural Systems 工程技术-工程:电子与电气
CiteScore
3.70
自引率
1.30%
发文量
22
审稿时长
>12 weeks
期刊介绍: Network: Computation in Neural Systems welcomes submissions of research papers that integrate theoretical neuroscience with experimental data, emphasizing the utilization of cutting-edge technologies. We invite authors and researchers to contribute their work in the following areas: Theoretical Neuroscience: This section encompasses neural network modeling approaches that elucidate brain function. Neural Networks in Data Analysis and Pattern Recognition: We encourage submissions exploring the use of neural networks for data analysis and pattern recognition, including but not limited to image analysis and speech processing applications. Neural Networks in Control Systems: This category encompasses the utilization of neural networks in control systems, including robotics, state estimation, fault detection, and diagnosis. Analysis of Neurophysiological Data: We invite submissions focusing on the analysis of neurophysiology data obtained from experimental studies involving animals. Analysis of Experimental Data on the Human Brain: This section includes papers analyzing experimental data from studies on the human brain, utilizing imaging techniques such as MRI, fMRI, EEG, and PET. Neurobiological Foundations of Consciousness: We encourage submissions exploring the neural bases of consciousness in the brain and its simulation in machines.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信