Yongbo Qiu, Xin Yang, Siqi Yang, Yuyou Gong, Qinrui Lv, Bo Yang
{"title":"基于混合音频特征和 ResLSTM 的婴儿哭声分类。","authors":"Yongbo Qiu, Xin Yang, Siqi Yang, Yuyou Gong, Qinrui Lv, Bo Yang","doi":"10.1016/j.jvoice.2024.08.022","DOIUrl":null,"url":null,"abstract":"<p><p>Crying is one of the primary means by which infants communicate with their environment in the early stages of life. These cries can be triggered by physiological factors such as hunger or sleepiness, or by pathological factors such as illness or discomfort. Therefore, analyzing infant cries can assist inexperienced parents in better caring for their babies. Most studies have predominantly utilized a single-speech feature, such as Mel Frequency Cepstral Coefficients (MFCC), for classifying infant cries, while other speech features, such as Mel Spectrogram and Tonnetz, are often overlooked. In this study, we manually designed a hybrid feature set, MMT (including MFCC, Mel Spectrogram, and Tonnetz), and explored its application in infant cry classification. Additionally, we proposed a convolutional neural network based on residual connections and long short-term memory (LSTM) networks, termed ResLSTM. We compared the performance of different deep learning models using the hybrid feature set MMT and the single MFCC feature. This study utilized the Baby Crying, Dunstan Baby Language, and Donate a Cry datasets. The results indicate that the hybrid feature set MMT outperforms the single MFCC feature. The MMT combined with the ResLSTM method achieved the best performance, obtaining accuracy rates of 94.15%, 92.92%, and 95.98% on the three datasets, respectively.</p>","PeriodicalId":49954,"journal":{"name":"Journal of Voice","volume":null,"pages":null},"PeriodicalIF":2.5000,"publicationDate":"2024-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Classification of Infant Cry Based on Hybrid Audio Features and ResLSTM.\",\"authors\":\"Yongbo Qiu, Xin Yang, Siqi Yang, Yuyou Gong, Qinrui Lv, Bo Yang\",\"doi\":\"10.1016/j.jvoice.2024.08.022\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Crying is one of the primary means by which infants communicate with their environment in the early stages of life. These cries can be triggered by physiological factors such as hunger or sleepiness, or by pathological factors such as illness or discomfort. Therefore, analyzing infant cries can assist inexperienced parents in better caring for their babies. Most studies have predominantly utilized a single-speech feature, such as Mel Frequency Cepstral Coefficients (MFCC), for classifying infant cries, while other speech features, such as Mel Spectrogram and Tonnetz, are often overlooked. In this study, we manually designed a hybrid feature set, MMT (including MFCC, Mel Spectrogram, and Tonnetz), and explored its application in infant cry classification. Additionally, we proposed a convolutional neural network based on residual connections and long short-term memory (LSTM) networks, termed ResLSTM. We compared the performance of different deep learning models using the hybrid feature set MMT and the single MFCC feature. This study utilized the Baby Crying, Dunstan Baby Language, and Donate a Cry datasets. The results indicate that the hybrid feature set MMT outperforms the single MFCC feature. The MMT combined with the ResLSTM method achieved the best performance, obtaining accuracy rates of 94.15%, 92.92%, and 95.98% on the three datasets, respectively.</p>\",\"PeriodicalId\":49954,\"journal\":{\"name\":\"Journal of Voice\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.5000,\"publicationDate\":\"2024-09-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Voice\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1016/j.jvoice.2024.08.022\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUDIOLOGY & SPEECH-LANGUAGE PATHOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Voice","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.jvoice.2024.08.022","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUDIOLOGY & SPEECH-LANGUAGE PATHOLOGY","Score":null,"Total":0}
Classification of Infant Cry Based on Hybrid Audio Features and ResLSTM.
Crying is one of the primary means by which infants communicate with their environment in the early stages of life. These cries can be triggered by physiological factors such as hunger or sleepiness, or by pathological factors such as illness or discomfort. Therefore, analyzing infant cries can assist inexperienced parents in better caring for their babies. Most studies have predominantly utilized a single-speech feature, such as Mel Frequency Cepstral Coefficients (MFCC), for classifying infant cries, while other speech features, such as Mel Spectrogram and Tonnetz, are often overlooked. In this study, we manually designed a hybrid feature set, MMT (including MFCC, Mel Spectrogram, and Tonnetz), and explored its application in infant cry classification. Additionally, we proposed a convolutional neural network based on residual connections and long short-term memory (LSTM) networks, termed ResLSTM. We compared the performance of different deep learning models using the hybrid feature set MMT and the single MFCC feature. This study utilized the Baby Crying, Dunstan Baby Language, and Donate a Cry datasets. The results indicate that the hybrid feature set MMT outperforms the single MFCC feature. The MMT combined with the ResLSTM method achieved the best performance, obtaining accuracy rates of 94.15%, 92.92%, and 95.98% on the three datasets, respectively.
期刊介绍:
The Journal of Voice is widely regarded as the world''s premiere journal for voice medicine and research. This peer-reviewed publication is listed in Index Medicus and is indexed by the Institute for Scientific Information. The journal contains articles written by experts throughout the world on all topics in voice sciences, voice medicine and surgery, and speech-language pathologists'' management of voice-related problems. The journal includes clinical articles, clinical research, and laboratory research. Members of the Foundation receive the journal as a benefit of membership.