Yongbo Qiu, Xin Yang, Siqi Yang, Yuyou Gong, Qinrui Lv, Bo Yang
{"title":"Classification of Infant Cry Based on Hybrid Audio Features and ResLSTM.","authors":"Yongbo Qiu, Xin Yang, Siqi Yang, Yuyou Gong, Qinrui Lv, Bo Yang","doi":"10.1016/j.jvoice.2024.08.022","DOIUrl":null,"url":null,"abstract":"<p><p>Crying is one of the primary means by which infants communicate with their environment in the early stages of life. These cries can be triggered by physiological factors such as hunger or sleepiness, or by pathological factors such as illness or discomfort. Therefore, analyzing infant cries can assist inexperienced parents in better caring for their babies. Most studies have predominantly utilized a single-speech feature, such as Mel Frequency Cepstral Coefficients (MFCC), for classifying infant cries, while other speech features, such as Mel Spectrogram and Tonnetz, are often overlooked. In this study, we manually designed a hybrid feature set, MMT (including MFCC, Mel Spectrogram, and Tonnetz), and explored its application in infant cry classification. Additionally, we proposed a convolutional neural network based on residual connections and long short-term memory (LSTM) networks, termed ResLSTM. We compared the performance of different deep learning models using the hybrid feature set MMT and the single MFCC feature. This study utilized the Baby Crying, Dunstan Baby Language, and Donate a Cry datasets. The results indicate that the hybrid feature set MMT outperforms the single MFCC feature. The MMT combined with the ResLSTM method achieved the best performance, obtaining accuracy rates of 94.15%, 92.92%, and 95.98% on the three datasets, respectively.</p>","PeriodicalId":49954,"journal":{"name":"Journal of Voice","volume":null,"pages":null},"PeriodicalIF":2.5000,"publicationDate":"2024-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Voice","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.jvoice.2024.08.022","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUDIOLOGY & SPEECH-LANGUAGE PATHOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Crying is one of the primary means by which infants communicate with their environment in the early stages of life. These cries can be triggered by physiological factors such as hunger or sleepiness, or by pathological factors such as illness or discomfort. Therefore, analyzing infant cries can assist inexperienced parents in better caring for their babies. Most studies have predominantly utilized a single-speech feature, such as Mel Frequency Cepstral Coefficients (MFCC), for classifying infant cries, while other speech features, such as Mel Spectrogram and Tonnetz, are often overlooked. In this study, we manually designed a hybrid feature set, MMT (including MFCC, Mel Spectrogram, and Tonnetz), and explored its application in infant cry classification. Additionally, we proposed a convolutional neural network based on residual connections and long short-term memory (LSTM) networks, termed ResLSTM. We compared the performance of different deep learning models using the hybrid feature set MMT and the single MFCC feature. This study utilized the Baby Crying, Dunstan Baby Language, and Donate a Cry datasets. The results indicate that the hybrid feature set MMT outperforms the single MFCC feature. The MMT combined with the ResLSTM method achieved the best performance, obtaining accuracy rates of 94.15%, 92.92%, and 95.98% on the three datasets, respectively.
期刊介绍:
The Journal of Voice is widely regarded as the world''s premiere journal for voice medicine and research. This peer-reviewed publication is listed in Index Medicus and is indexed by the Institute for Scientific Information. The journal contains articles written by experts throughout the world on all topics in voice sciences, voice medicine and surgery, and speech-language pathologists'' management of voice-related problems. The journal includes clinical articles, clinical research, and laboratory research. Members of the Foundation receive the journal as a benefit of membership.