使用 MFCC 特征检测婴儿哭声的机器学习方法比较研究

Putri Agustina Riadi, M. Faisal, D. Kartini, Radityo Adi Nugroho, D. T. Nugrahadi, Dike Bayu Magfira
{"title":"使用 MFCC 特征检测婴儿哭声的机器学习方法比较研究","authors":"Putri Agustina Riadi, M. Faisal, D. Kartini, Radityo Adi Nugroho, D. T. Nugrahadi, Dike Bayu Magfira","doi":"10.35882/jeeemi.v6i1.350","DOIUrl":null,"url":null,"abstract":"The vocalization of infants, commonly known as baby crying, represents one of the primary means by which infants effectively communicate their needs and emotional states to adults. While the act of crying can yield crucial insights into the well-being and comfort of a baby, there exists a dearth of research specifically investigating the influence of the audio range within a baby cry on research outcomes. The core problem of research is the lack of research on the influence of audio range on baby cry classification on machine learning.  The purpose of this study is to ascertain the impact of the duration of an infant’s cry on the outcomes of machine learning classification and to gain knowledge regarding the accuracy of results F1 score obtained through the utilization of the machine learning method. The contribution is to enrich an understanding of the application of classification and feature selection in audio datasets, particulary in the context of baby cry audio. The utilized dataset, known as donate-a-cry-corpus, encompasses five distinct data classes and possesses a duration of seven seconds. The employed methodology consists of the spectrogram technique, cross-validation for data partitioning, MFCC feature extraction with 10, 20, and 30 coefficients, as well as machine learning models including Support Vector Machine, Random Forest, and Naïve Bayes. The findings of this study reveal that the Random Forest model achieved an accuracy of 0.844 and an F1 score of 0.773 when 10 MFCC coefficients were utilized and the optimal audio range was set at six seconds. Furthermore, the Support Vector Machine model with an RBF kernel yielded an accuracy of 0.836 and an F1 score of 0.761, while the Naïve Bayes model achieved an accuracy 0.538 and F1 score of 0.539. Notably, no discernible differences were observed when evaluating the Support Vector Machine and Naïve Bayes methods across the 1-7 second time trial. The implication of this research is to establish a foundation for the advancement of premature illness identification techniques grounded in the vocalizations of infants, thereby facilitating swifter diagnostic processes for pediatric practitioners.","PeriodicalId":369032,"journal":{"name":"Journal of Electronics, Electromedical Engineering, and Medical Informatics","volume":" 18","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Comparative Study of Machine Learning Methods for Baby Cry Detection Using MFCC Features\",\"authors\":\"Putri Agustina Riadi, M. Faisal, D. Kartini, Radityo Adi Nugroho, D. T. Nugrahadi, Dike Bayu Magfira\",\"doi\":\"10.35882/jeeemi.v6i1.350\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The vocalization of infants, commonly known as baby crying, represents one of the primary means by which infants effectively communicate their needs and emotional states to adults. While the act of crying can yield crucial insights into the well-being and comfort of a baby, there exists a dearth of research specifically investigating the influence of the audio range within a baby cry on research outcomes. The core problem of research is the lack of research on the influence of audio range on baby cry classification on machine learning.  The purpose of this study is to ascertain the impact of the duration of an infant’s cry on the outcomes of machine learning classification and to gain knowledge regarding the accuracy of results F1 score obtained through the utilization of the machine learning method. The contribution is to enrich an understanding of the application of classification and feature selection in audio datasets, particulary in the context of baby cry audio. The utilized dataset, known as donate-a-cry-corpus, encompasses five distinct data classes and possesses a duration of seven seconds. The employed methodology consists of the spectrogram technique, cross-validation for data partitioning, MFCC feature extraction with 10, 20, and 30 coefficients, as well as machine learning models including Support Vector Machine, Random Forest, and Naïve Bayes. The findings of this study reveal that the Random Forest model achieved an accuracy of 0.844 and an F1 score of 0.773 when 10 MFCC coefficients were utilized and the optimal audio range was set at six seconds. Furthermore, the Support Vector Machine model with an RBF kernel yielded an accuracy of 0.836 and an F1 score of 0.761, while the Naïve Bayes model achieved an accuracy 0.538 and F1 score of 0.539. Notably, no discernible differences were observed when evaluating the Support Vector Machine and Naïve Bayes methods across the 1-7 second time trial. The implication of this research is to establish a foundation for the advancement of premature illness identification techniques grounded in the vocalizations of infants, thereby facilitating swifter diagnostic processes for pediatric practitioners.\",\"PeriodicalId\":369032,\"journal\":{\"name\":\"Journal of Electronics, Electromedical Engineering, and Medical Informatics\",\"volume\":\" 18\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-01-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Electronics, Electromedical Engineering, and Medical Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.35882/jeeemi.v6i1.350\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Electronics, Electromedical Engineering, and Medical Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.35882/jeeemi.v6i1.350","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

婴儿发声,俗称婴儿啼哭,是婴儿向成人有效传达其需求和情绪状态的主要方式之一。虽然哭声能让人洞察到婴儿的幸福和舒适,但专门调查婴儿哭声音域对研究结果影响的研究却十分匮乏。研究的核心问题是缺乏关于音频范围对机器学习中婴儿哭声分类的影响的研究。 本研究的目的是确定婴儿哭声的持续时间对机器学习分类结果的影响,并了解通过使用机器学习方法获得的结果 F1 分数的准确性。其贡献在于丰富了对音频数据集分类和特征选择应用的理解,尤其是在婴儿哭声音频方面。所使用的数据集名为 "捐赠-哭声-语料库",包含五个不同的数据类别,持续时间为七秒钟。所采用的方法包括频谱图技术、用于数据分区的交叉验证、10、20 和 30 个系数的 MFCC 特征提取,以及包括支持向量机、随机森林和奈夫贝叶斯在内的机器学习模型。研究结果表明,当使用 10 个 MFCC 系数并将最佳音频范围设定为 6 秒时,随机森林模型的准确率达到 0.844,F1 分数达到 0.773。此外,采用 RBF 内核的支持向量机模型的准确率为 0.836,F1 得分为 0.761,而 Naïve Bayes 模型的准确率为 0.538,F1 得分为 0.539。值得注意的是,在评估支持向量机和奈伊夫贝叶斯方法时,在 1-7 秒的时间试验中没有观察到明显的差异。这项研究的意义在于为推进以婴儿发声为基础的早产儿疾病识别技术奠定基础,从而为儿科医生提供更快捷的诊断流程。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A Comparative Study of Machine Learning Methods for Baby Cry Detection Using MFCC Features
The vocalization of infants, commonly known as baby crying, represents one of the primary means by which infants effectively communicate their needs and emotional states to adults. While the act of crying can yield crucial insights into the well-being and comfort of a baby, there exists a dearth of research specifically investigating the influence of the audio range within a baby cry on research outcomes. The core problem of research is the lack of research on the influence of audio range on baby cry classification on machine learning.  The purpose of this study is to ascertain the impact of the duration of an infant’s cry on the outcomes of machine learning classification and to gain knowledge regarding the accuracy of results F1 score obtained through the utilization of the machine learning method. The contribution is to enrich an understanding of the application of classification and feature selection in audio datasets, particulary in the context of baby cry audio. The utilized dataset, known as donate-a-cry-corpus, encompasses five distinct data classes and possesses a duration of seven seconds. The employed methodology consists of the spectrogram technique, cross-validation for data partitioning, MFCC feature extraction with 10, 20, and 30 coefficients, as well as machine learning models including Support Vector Machine, Random Forest, and Naïve Bayes. The findings of this study reveal that the Random Forest model achieved an accuracy of 0.844 and an F1 score of 0.773 when 10 MFCC coefficients were utilized and the optimal audio range was set at six seconds. Furthermore, the Support Vector Machine model with an RBF kernel yielded an accuracy of 0.836 and an F1 score of 0.761, while the Naïve Bayes model achieved an accuracy 0.538 and F1 score of 0.539. Notably, no discernible differences were observed when evaluating the Support Vector Machine and Naïve Bayes methods across the 1-7 second time trial. The implication of this research is to establish a foundation for the advancement of premature illness identification techniques grounded in the vocalizations of infants, thereby facilitating swifter diagnostic processes for pediatric practitioners.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信