Biomimetic Computing for Efficient Spoken Language Identification.

IF 3.4 3区医学 Q1 ENGINEERING, MULTIDISCIPLINARY

Biomimetics Pub Date : 2025-05-14 DOI:10.3390/biomimetics10050316

Gaurav Kumar, Saurabh Bhardwaj

{"title":"Biomimetic Computing for Efficient Spoken Language Identification.","authors":"Gaurav Kumar, Saurabh Bhardwaj","doi":"10.3390/biomimetics10050316","DOIUrl":null,"url":null,"abstract":"<p><p>Spoken Language Identification (SLID)-based applications have become increasingly important in everyday life, driven by advancements in artificial intelligence and machine learning. Multilingual countries utilize the SLID method to facilitate speech detection. This is accomplished by determining the language of the spoken parts using language recognizers. On the other hand, when working with multilingual datasets, the presence of multiple languages that have a shared origin presents a significant challenge for accurately classifying languages using automatic techniques. Further, one more challenge is the significant variance in speech signals caused by factors such as different speakers, content, acoustic settings, language differences, changes in voice modulation based on age and gender, and variations in speech patterns. In this study, we introduce the DBODL-MSLIS approach, which integrates biomimetic optimization techniques inspired by natural intelligence to enhance language classification. The proposed method employs Dung Beetle Optimization (DBO) with Deep Learning, simulating the beetle's foraging behavior to optimize feature selection and classification performance. The proposed technique integrates speech preprocessing, which encompasses pre-emphasis, windowing, and frame blocking, followed by feature extraction utilizing pitch, energy, Discrete Wavelet Transform (DWT), and Zero crossing rate (ZCR). Further, the selection of features is performed by DBO algorithm, which removes redundant features and helps to improve efficiency and accuracy. Spoken languages are classified using Bayesian optimization (BO) in conjunction with a long short-term memory (LSTM) network. The DBODL-MSLIS technique has been experimentally validated using the IIIT Spoken Language dataset. The results indicate an average accuracy of 95.54% and an F-score of 84.31%. This technique surpasses various other state-of-the-art models, such as SVM, MLP, LDA, DLA-ASLISS, HMHFS-IISLFAS, GA base fusion, and VGG-16. We have evaluated the accuracy of our proposed technique against state-of-the-art biomimetic computing models such as GA, PSO, GWO, DE, and ACO. While ACO achieved up to 89.45% accuracy, our Bayesian Optimization with LSTM outperformed all others, reaching a peak accuracy of 95.55%, demonstrating its effectiveness in enhancing spoken language identification. The suggested technique demonstrates promising potential for practical applications in the field of multi-lingual voice processing.</p>","PeriodicalId":8907,"journal":{"name":"Biomimetics","volume":"10 5","pages":""},"PeriodicalIF":3.4000,"publicationDate":"2025-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12108623/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biomimetics","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.3390/biomimetics10050316","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

Spoken Language Identification (SLID)-based applications have become increasingly important in everyday life, driven by advancements in artificial intelligence and machine learning. Multilingual countries utilize the SLID method to facilitate speech detection. This is accomplished by determining the language of the spoken parts using language recognizers. On the other hand, when working with multilingual datasets, the presence of multiple languages that have a shared origin presents a significant challenge for accurately classifying languages using automatic techniques. Further, one more challenge is the significant variance in speech signals caused by factors such as different speakers, content, acoustic settings, language differences, changes in voice modulation based on age and gender, and variations in speech patterns. In this study, we introduce the DBODL-MSLIS approach, which integrates biomimetic optimization techniques inspired by natural intelligence to enhance language classification. The proposed method employs Dung Beetle Optimization (DBO) with Deep Learning, simulating the beetle's foraging behavior to optimize feature selection and classification performance. The proposed technique integrates speech preprocessing, which encompasses pre-emphasis, windowing, and frame blocking, followed by feature extraction utilizing pitch, energy, Discrete Wavelet Transform (DWT), and Zero crossing rate (ZCR). Further, the selection of features is performed by DBO algorithm, which removes redundant features and helps to improve efficiency and accuracy. Spoken languages are classified using Bayesian optimization (BO) in conjunction with a long short-term memory (LSTM) network. The DBODL-MSLIS technique has been experimentally validated using the IIIT Spoken Language dataset. The results indicate an average accuracy of 95.54% and an F-score of 84.31%. This technique surpasses various other state-of-the-art models, such as SVM, MLP, LDA, DLA-ASLISS, HMHFS-IISLFAS, GA base fusion, and VGG-16. We have evaluated the accuracy of our proposed technique against state-of-the-art biomimetic computing models such as GA, PSO, GWO, DE, and ACO. While ACO achieved up to 89.45% accuracy, our Bayesian Optimization with LSTM outperformed all others, reaching a peak accuracy of 95.55%, demonstrating its effectiveness in enhancing spoken language identification. The suggested technique demonstrates promising potential for practical applications in the field of multi-lingual voice processing.

查看原文本刊更多论文

高效口语识别的仿生计算。

在人工智能和机器学习进步的推动下，基于口语识别（slide）的应用在日常生活中变得越来越重要。多语言国家使用滑动方法来促进语音检测。这是通过使用语言识别器确定口语部分的语言来实现的。另一方面，当处理多语言数据集时，具有共享起源的多种语言的存在对使用自动技术准确分类语言提出了重大挑战。此外，另一个挑战是语音信号的显著差异，这是由不同的说话者、内容、声学设置、语言差异、基于年龄和性别的语音调制变化以及语音模式的变化等因素引起的。在本研究中，我们引入了DBODL-MSLIS方法，该方法集成了受自然智能启发的仿生优化技术来增强语言分类。该方法将屎壳郎优化算法（DBO）与深度学习相结合，模拟屎壳郎觅食行为，优化特征选择和分类性能。该技术集成了语音预处理，包括预强调、加窗和帧块，然后是利用基音、能量、离散小波变换（DWT）和过零率（ZCR）进行特征提取。采用DBO算法对特征进行选择，去除冗余特征，提高了识别效率和准确性。使用贝叶斯优化（BO）和长短期记忆（LSTM）网络对口语进行分类。DBODL-MSLIS技术已经使用IIIT口语数据集进行了实验验证。结果表明，平均准确率为95.54%，f值为84.31%。该技术超越了其他各种最先进的模型，如SVM、MLP、LDA、DLA-ASLISS、HMHFS-IISLFAS、GA碱基融合和VGG-16。我们已经针对最先进的仿生计算模型（如GA、PSO、GWO、DE和ACO）评估了我们提出的技术的准确性。虽然蚁群算法的准确率高达89.45%，但我们基于LSTM的贝叶斯优化优于所有其他算法，达到95.55%的峰值准确率，证明了其在增强口语识别方面的有效性。该方法在多语言语音处理领域具有广阔的应用前景。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊