A Comparative Study of Machine Learning Methods for Baby Cry Detection Using MFCC Features

Journal of Electronics, Electromedical Engineering, and Medical Informatics Pub Date : 2024-01-12 DOI:10.35882/jeeemi.v6i1.350

Putri Agustina Riadi, M. Faisal, D. Kartini, Radityo Adi Nugroho, D. T. Nugrahadi, Dike Bayu Magfira

{"title":"A Comparative Study of Machine Learning Methods for Baby Cry Detection Using MFCC Features","authors":"Putri Agustina Riadi, M. Faisal, D. Kartini, Radityo Adi Nugroho, D. T. Nugrahadi, Dike Bayu Magfira","doi":"10.35882/jeeemi.v6i1.350","DOIUrl":null,"url":null,"abstract":"The vocalization of infants, commonly known as baby crying, represents one of the primary means by which infants effectively communicate their needs and emotional states to adults. While the act of crying can yield crucial insights into the well-being and comfort of a baby, there exists a dearth of research specifically investigating the influence of the audio range within a baby cry on research outcomes. The core problem of research is the lack of research on the influence of audio range on baby cry classification on machine learning. The purpose of this study is to ascertain the impact of the duration of an infant’s cry on the outcomes of machine learning classification and to gain knowledge regarding the accuracy of results F1 score obtained through the utilization of the machine learning method. The contribution is to enrich an understanding of the application of classification and feature selection in audio datasets, particulary in the context of baby cry audio. The utilized dataset, known as donate-a-cry-corpus, encompasses five distinct data classes and possesses a duration of seven seconds. The employed methodology consists of the spectrogram technique, cross-validation for data partitioning, MFCC feature extraction with 10, 20, and 30 coefficients, as well as machine learning models including Support Vector Machine, Random Forest, and Naïve Bayes. The findings of this study reveal that the Random Forest model achieved an accuracy of 0.844 and an F1 score of 0.773 when 10 MFCC coefficients were utilized and the optimal audio range was set at six seconds. Furthermore, the Support Vector Machine model with an RBF kernel yielded an accuracy of 0.836 and an F1 score of 0.761, while the Naïve Bayes model achieved an accuracy 0.538 and F1 score of 0.539. Notably, no discernible differences were observed when evaluating the Support Vector Machine and Naïve Bayes methods across the 1-7 second time trial. The implication of this research is to establish a foundation for the advancement of premature illness identification techniques grounded in the vocalizations of infants, thereby facilitating swifter diagnostic processes for pediatric practitioners.","PeriodicalId":369032,"journal":{"name":"Journal of Electronics, Electromedical Engineering, and Medical Informatics","volume":" 18","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Electronics, Electromedical Engineering, and Medical Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.35882/jeeemi.v6i1.350","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The vocalization of infants, commonly known as baby crying, represents one of the primary means by which infants effectively communicate their needs and emotional states to adults. While the act of crying can yield crucial insights into the well-being and comfort of a baby, there exists a dearth of research specifically investigating the influence of the audio range within a baby cry on research outcomes. The core problem of research is the lack of research on the influence of audio range on baby cry classification on machine learning. The purpose of this study is to ascertain the impact of the duration of an infant’s cry on the outcomes of machine learning classification and to gain knowledge regarding the accuracy of results F1 score obtained through the utilization of the machine learning method. The contribution is to enrich an understanding of the application of classification and feature selection in audio datasets, particulary in the context of baby cry audio. The utilized dataset, known as donate-a-cry-corpus, encompasses five distinct data classes and possesses a duration of seven seconds. The employed methodology consists of the spectrogram technique, cross-validation for data partitioning, MFCC feature extraction with 10, 20, and 30 coefficients, as well as machine learning models including Support Vector Machine, Random Forest, and Naïve Bayes. The findings of this study reveal that the Random Forest model achieved an accuracy of 0.844 and an F1 score of 0.773 when 10 MFCC coefficients were utilized and the optimal audio range was set at six seconds. Furthermore, the Support Vector Machine model with an RBF kernel yielded an accuracy of 0.836 and an F1 score of 0.761, while the Naïve Bayes model achieved an accuracy 0.538 and F1 score of 0.539. Notably, no discernible differences were observed when evaluating the Support Vector Machine and Naïve Bayes methods across the 1-7 second time trial. The implication of this research is to establish a foundation for the advancement of premature illness identification techniques grounded in the vocalizations of infants, thereby facilitating swifter diagnostic processes for pediatric practitioners.

查看原文本刊更多论文

使用 MFCC 特征检测婴儿哭声的机器学习方法比较研究

婴儿发声，俗称婴儿啼哭，是婴儿向成人有效传达其需求和情绪状态的主要方式之一。虽然哭声能让人洞察到婴儿的幸福和舒适，但专门调查婴儿哭声音域对研究结果影响的研究却十分匮乏。研究的核心问题是缺乏关于音频范围对机器学习中婴儿哭声分类的影响的研究。本研究的目的是确定婴儿哭声的持续时间对机器学习分类结果的影响，并了解通过使用机器学习方法获得的结果 F1 分数的准确性。其贡献在于丰富了对音频数据集分类和特征选择应用的理解，尤其是在婴儿哭声音频方面。所使用的数据集名为 "捐赠-哭声-语料库"，包含五个不同的数据类别，持续时间为七秒钟。所采用的方法包括频谱图技术、用于数据分区的交叉验证、10、20 和 30 个系数的 MFCC 特征提取，以及包括支持向量机、随机森林和奈夫贝叶斯在内的机器学习模型。研究结果表明，当使用 10 个 MFCC 系数并将最佳音频范围设定为 6 秒时，随机森林模型的准确率达到 0.844，F1 分数达到 0.773。此外，采用 RBF 内核的支持向量机模型的准确率为 0.836，F1 得分为 0.761，而 Naïve Bayes 模型的准确率为 0.538，F1 得分为 0.539。值得注意的是，在评估支持向量机和奈伊夫贝叶斯方法时，在 1-7 秒的时间试验中没有观察到明显的差异。这项研究的意义在于为推进以婴儿发声为基础的早产儿疾病识别技术奠定基础，从而为儿科医生提供更快捷的诊断流程。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Electronics, Electromedical Engineering, and Medical Informatics

自引率

0.00%

发文量