A Systematic Review of Advances in Infant Cry Paralinguistic Classification: Methods, Implementation, and Applications.

Q2 Medicine
Geofrey Owino, Bernard Bernard Shibwabo
{"title":"A Systematic Review of Advances in Infant Cry Paralinguistic Classification: Methods, Implementation, and Applications.","authors":"Geofrey Owino, Bernard Bernard Shibwabo","doi":"10.2196/69457","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Effective communication is essential for human interaction, yet infants can only express their needs through various types of suggestive cries. Traditional approaches of interpreting infant cries are often subjective, inconsistent, and slow leaving gaps in timely, precise caregiving responses. A precise interpretation of infant cries can potentially provide valuable insights into the infant's health, needs, and well-being, enabling prompt medical or caregiving actions.</p><p><strong>Objective: </strong>This study seeks to systematically review the advancements in methods, coverage, deployment schemes, and applications of infant cry classification over the last 24 years. The review focuses on the different infant cry classification techniques, feature extraction methods, and the practical applications. Furthermore, we aimed to identify recent trends and directions in the field of infant cry signal processing to address both academic and practical needs.</p><p><strong>Methods: </strong>A systematic literature review was conducted by using nine electronic databases: Cochrane Database of Systematic Reviews, JSTOR, Web of Science Core Collection, Scopus, PubMed, ACM, MEDLINE, IEEE Xplore, and Google Scholar. A total of 5904 search results were initially retrieved, with 126 studies meeting the eligibility criteria after screening by two independent reviewers. The methodological quality of the studies was assessed using the Cochrane risk-of-bias tool version 2 (RoB2), with 92% (n=116) of the studies indicating a low risk of bias and 8% (n=10) of the studies showing some concerns regarding bias. The overall quality assessment was performed using the TRIPOD (Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis) guidelines. The data analysis was conducted using R version 3.64.</p><p><strong>Results: </strong>Notable advancements in infant cry classification methods were realized, particularly from 2019 onwards employing machine learning, deep learning, and hybrid approaches. Common audio features included Mel-frequency cepstral coefficients (MFCCs), spectrograms, pitch, duration, intensity, formants, zero-crossing rate and chroma. Deployment methods included mobile applications and web-based platforms for real-time analysis with 90% (n=113) of the remaining models remained undeployed to real world applications. Denoising techniques and federated learning were limitedly employed to enhance model robustness and ensure data confidentiality from 5% (n=6) of the studies. Some of the practical applications spanned healthcare monitoring, diagnostics, and caregiver support.</p><p><strong>Conclusions: </strong>The evolution of infant cry classification methods has progressed from traditional classical statistical methods to machine learning models but with minimal considerations of data privacy, confidentiality, and ultimate deployment to the practical use. Further research is thus proposed to develop standardized foundational audio multimodal approaches, incorporating a broader range of audio features and ensuring data confidentiality through methods such as federated learning. Furthermore, a preliminary layer is proposed for denoising the cry signal before the feature extractions stage. These improvements will enhance the accuracy, generalizability, and practical applicability of infant cry classification models in diverse healthcare settings.</p>","PeriodicalId":36224,"journal":{"name":"JMIR Rehabilitation and Assistive Technologies","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Rehabilitation and Assistive Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2196/69457","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Effective communication is essential for human interaction, yet infants can only express their needs through various types of suggestive cries. Traditional approaches of interpreting infant cries are often subjective, inconsistent, and slow leaving gaps in timely, precise caregiving responses. A precise interpretation of infant cries can potentially provide valuable insights into the infant's health, needs, and well-being, enabling prompt medical or caregiving actions.

Objective: This study seeks to systematically review the advancements in methods, coverage, deployment schemes, and applications of infant cry classification over the last 24 years. The review focuses on the different infant cry classification techniques, feature extraction methods, and the practical applications. Furthermore, we aimed to identify recent trends and directions in the field of infant cry signal processing to address both academic and practical needs.

Methods: A systematic literature review was conducted by using nine electronic databases: Cochrane Database of Systematic Reviews, JSTOR, Web of Science Core Collection, Scopus, PubMed, ACM, MEDLINE, IEEE Xplore, and Google Scholar. A total of 5904 search results were initially retrieved, with 126 studies meeting the eligibility criteria after screening by two independent reviewers. The methodological quality of the studies was assessed using the Cochrane risk-of-bias tool version 2 (RoB2), with 92% (n=116) of the studies indicating a low risk of bias and 8% (n=10) of the studies showing some concerns regarding bias. The overall quality assessment was performed using the TRIPOD (Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis) guidelines. The data analysis was conducted using R version 3.64.

Results: Notable advancements in infant cry classification methods were realized, particularly from 2019 onwards employing machine learning, deep learning, and hybrid approaches. Common audio features included Mel-frequency cepstral coefficients (MFCCs), spectrograms, pitch, duration, intensity, formants, zero-crossing rate and chroma. Deployment methods included mobile applications and web-based platforms for real-time analysis with 90% (n=113) of the remaining models remained undeployed to real world applications. Denoising techniques and federated learning were limitedly employed to enhance model robustness and ensure data confidentiality from 5% (n=6) of the studies. Some of the practical applications spanned healthcare monitoring, diagnostics, and caregiver support.

Conclusions: The evolution of infant cry classification methods has progressed from traditional classical statistical methods to machine learning models but with minimal considerations of data privacy, confidentiality, and ultimate deployment to the practical use. Further research is thus proposed to develop standardized foundational audio multimodal approaches, incorporating a broader range of audio features and ensuring data confidentiality through methods such as federated learning. Furthermore, a preliminary layer is proposed for denoising the cry signal before the feature extractions stage. These improvements will enhance the accuracy, generalizability, and practical applicability of infant cry classification models in diverse healthcare settings.

婴儿哭声副语言分类研究进展综述:方法、实现和应用。
背景:有效的沟通是人类互动的必要条件,然而婴儿只能通过各种类型的暗示性哭声来表达他们的需求。解释婴儿哭声的传统方法往往是主观的,不一致的,并且在及时,精确的护理反应中留下空白。对婴儿哭声的精确解释可以潜在地为婴儿的健康、需求和福祉提供有价值的见解,从而使及时的医疗或护理行动成为可能。目的:本研究旨在系统回顾过去24年来婴儿哭声分类在方法、覆盖范围、部署方案和应用方面的进展。综述了婴儿哭声分类技术、特征提取方法及其在婴儿哭声分类中的应用。此外,我们旨在确定婴儿哭声信号处理领域的最新趋势和方向,以满足学术和实践需求。方法:采用Cochrane system Reviews Database、JSTOR、Web of Science Core Collection、Scopus、PubMed、ACM、MEDLINE、IEEE Xplore、谷歌Scholar等9个电子数据库进行系统文献综述。最初共检索到5904个检索结果,经两名独立审稿人筛选后,有126个研究符合资格标准。使用Cochrane风险偏倚工具第2版(RoB2)评估研究的方法学质量,92% (n=116)的研究显示低偏倚风险,8% (n=10)的研究显示存在偏倚问题。采用TRIPOD(透明报告个体预后或诊断的多变量预测模型)指南进行总体质量评估。使用R 3.64版本进行数据分析。结果:婴儿哭声分类方法取得了显著进展,特别是从2019年开始,采用机器学习、深度学习和混合方法。常见的音频特征包括Mel-frequency倒谱系数(MFCCs),谱图,音高,持续时间,强度,共振峰,过零率和色度。部署方法包括用于实时分析的移动应用程序和基于web的平台,其余90% (n=113)的模型仍未部署到现实世界的应用程序中。从5% (n=6)的研究中,去噪技术和联邦学习被有限地用于增强模型鲁棒性和确保数据机密性。一些实际应用涵盖了医疗保健监控、诊断和护理人员支持。结论:婴儿哭声分类方法的发展已经从传统的经典统计方法发展到机器学习模型,但很少考虑数据的隐私性、保密性,最终部署到实际应用。因此,建议进一步研究开发标准化的基础音频多模态方法,结合更广泛的音频特征,并通过联邦学习等方法确保数据保密性。此外,在特征提取阶段之前,提出了一个初始层用于对哭泣信号进行去噪。这些改进将提高婴儿哭声分类模型在不同医疗环境中的准确性、普遍性和实用性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
4.20
自引率
0.00%
发文量
31
审稿时长
12 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信