{"title":"Effective infant cry signal analysis and reasoning using IARO based leaky Bi-LSTM model","authors":"B.M. Mala, Smita Sandeep Darandale","doi":"10.1016/j.csl.2024.101621","DOIUrl":null,"url":null,"abstract":"<div><p>In the present scenario, the recognition of particular emotions or needs from an infant's cry is a difficult process in the field of pattern recognition as it does not have any verbal information. In this article, an automated model is introduced for an effective recognition of infant cries. At first, the infant cry signals are collected from the Baby Chillanto (BC) dataset and the Donate a Cry Corpus (DCC) dataset. These acquired signals are converted into feature vectors by employing nine techniques namely, Zero Crossing Rate (ZCR), acoustic features, audio features, amplitude, energy, Root Mean Square (RMS), statistical moments, autocorrelation, and Mel-Frequency Cepstral Coefficients (MFCCs). These obtained feature vectors are multi-dimensional; therefore, a Simulated Annealing Algorithm (SAA) is employed to select informative feature vectors. The selected informative feature vectors are passed to the leaky Bi-directional Long Short Term Memory (Bi-LSTM) model for classifying the types of infant cries. Specifically, in the leaky Bi-LSTM model, the conventional activation functions (Tangent (Tanh) and sigmoid) are replaced with the leaky Rectified Linear Unit (leaky ReLU) activation function. This process significantly mitigates the vanishing gradient problem and improves convergence during data training, which is vital for signal classification tasks. Furthermore, an Improved Artificial Rabbit's Optimization (IARO) algorithm is proposed to choose optimal hyper-parameters in the leaky Bi-LSTM model, where this mechanism reduces the complexity and training time of the classification model. In the IARO algorithm, selective opposition and Lévy flight strategies are integrated with the conventional ARO algorithm to enhance the dynamics and diversity of the population, along with the model's tracking efficiency. The empirical investigation denotes that the proposed IARO based leaky Bi-LSTM model achieves 99.66 % and 95.92 % of classification accuracy on the BC and DCC datasets, respectively. The proposed IARO based leaky Bi-LSTM model achieves maximum classification results when related to the conventional recognition models.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"86 ","pages":"Article 101621"},"PeriodicalIF":3.1000,"publicationDate":"2024-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Speech and Language","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0885230824000044","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
In the present scenario, the recognition of particular emotions or needs from an infant's cry is a difficult process in the field of pattern recognition as it does not have any verbal information. In this article, an automated model is introduced for an effective recognition of infant cries. At first, the infant cry signals are collected from the Baby Chillanto (BC) dataset and the Donate a Cry Corpus (DCC) dataset. These acquired signals are converted into feature vectors by employing nine techniques namely, Zero Crossing Rate (ZCR), acoustic features, audio features, amplitude, energy, Root Mean Square (RMS), statistical moments, autocorrelation, and Mel-Frequency Cepstral Coefficients (MFCCs). These obtained feature vectors are multi-dimensional; therefore, a Simulated Annealing Algorithm (SAA) is employed to select informative feature vectors. The selected informative feature vectors are passed to the leaky Bi-directional Long Short Term Memory (Bi-LSTM) model for classifying the types of infant cries. Specifically, in the leaky Bi-LSTM model, the conventional activation functions (Tangent (Tanh) and sigmoid) are replaced with the leaky Rectified Linear Unit (leaky ReLU) activation function. This process significantly mitigates the vanishing gradient problem and improves convergence during data training, which is vital for signal classification tasks. Furthermore, an Improved Artificial Rabbit's Optimization (IARO) algorithm is proposed to choose optimal hyper-parameters in the leaky Bi-LSTM model, where this mechanism reduces the complexity and training time of the classification model. In the IARO algorithm, selective opposition and Lévy flight strategies are integrated with the conventional ARO algorithm to enhance the dynamics and diversity of the population, along with the model's tracking efficiency. The empirical investigation denotes that the proposed IARO based leaky Bi-LSTM model achieves 99.66 % and 95.92 % of classification accuracy on the BC and DCC datasets, respectively. The proposed IARO based leaky Bi-LSTM model achieves maximum classification results when related to the conventional recognition models.
期刊介绍:
Computer Speech & Language publishes reports of original research related to the recognition, understanding, production, coding and mining of speech and language.
The speech and language sciences have a long history, but it is only relatively recently that large-scale implementation of and experimentation with complex models of speech and language processing has become feasible. Such research is often carried out somewhat separately by practitioners of artificial intelligence, computer science, electronic engineering, information retrieval, linguistics, phonetics, or psychology.