Advances in Infant Cry Paralinguistic Classification-Methods, Implementation, and Applications: Systematic Review.

Q2 Medicine

JMIR Rehabilitation and Assistive Technologies Pub Date : 2025-04-29 DOI:10.2196/69457

Geofrey Owino, Bernard Shibwabo

{"title":"Advances in Infant Cry Paralinguistic Classification-Methods, Implementation, and Applications: Systematic Review.","authors":"Geofrey Owino, Bernard Shibwabo","doi":"10.2196/69457","DOIUrl":null,"url":null,"abstract":"Background: Effective communication is essential for human interaction; yet, infants can only express their needs through various types of suggestive cries. Traditional approaches of interpreting infant cries are often subjective, inconsistent, and slow, leaving gaps in timely, precise caregiving responses. A precise interpretation of infant cries can potentially provide valuable insights into the infant's health, needs, and well-being, enabling prompt medical or caregiving actions.Objective: This study seeks to systematically review the advancements in methods, coverage, deployment schemes, and applications of infant cry classification over the last 24 years. The review focuses on the different infant cry classification techniques, feature extraction methods, and practical applications. Furthermore, we aimed to identify recent trends and directions in the field of infant cry signal processing to address both academic and practical needs.Methods: A systematic literature review was conducted using 9 electronic databases: Cochrane Database of Systematic Reviews, JSTOR, Web of Science Core Collection, Scopus, PubMed, ACM, MEDLINE, IEEE Xplore, and Google Scholar. A total of 5904 search results were initially retrieved, with 126 studies meeting the eligibility criteria after screening by 2 independent reviewers. The methodological quality of these studies was assessed using the Cochrane risk of bias tool (version 2; RoB2), with 92% (n=116) of the studies indicating a low risk of bias and 8% (n=10) of the studies showing some concerns regarding bias. The overall quality assessment was performed using TRIPOD (Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis) guidelines. The data analysis was conducted using R (version 3.64; R Foundation).Results: Notable advancements in infant cry classification methods were realized, particularly from 2019 onward, using machine learning, deep learning, and hybrid approaches. Common audio features included Mel-frequency cepstral coefficients, spectrograms, pitch, duration, intensity, formants, 0-crossing rate, and chroma. Deployment methods included mobile apps and web-based platforms for real-time analysis, with 90% (n=113) of the remaining models remaining undeployed to real-world applications. Denoising techniques and federated learning were limitedly used to enhance model robustness and ensure data confidentiality from 5% (n=6) of the studies. Some of the practical applications spanned health care monitoring, diagnostics, and caregiver support.Conclusions: The evolution of infant cry classification methods has progressed from traditional classical statistical methods to machine learning models but with minimal considerations of data privacy, confidentiality, and ultimate deployment to practical use. Further research is thus proposed to develop standardized foundational audio multimodal approaches, incorporating a broader range of audio features and ensuring data confidentiality through methods such as federated learning. Furthermore, a preliminary layer is proposed for denoising the cry signal before the feature extraction stage. These improvements will enhance the accuracy, generalizability, and practical applicability of infant cry classification models in diverse health care settings.","PeriodicalId":36224,"journal":{"name":"JMIR Rehabilitation and Assistive Technologies","volume":" ","pages":"e69457"},"PeriodicalIF":0.0000,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12076029/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Rehabilitation and Assistive Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2196/69457","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Medicine","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Effective communication is essential for human interaction; yet, infants can only express their needs through various types of suggestive cries. Traditional approaches of interpreting infant cries are often subjective, inconsistent, and slow, leaving gaps in timely, precise caregiving responses. A precise interpretation of infant cries can potentially provide valuable insights into the infant's health, needs, and well-being, enabling prompt medical or caregiving actions.

Objective: This study seeks to systematically review the advancements in methods, coverage, deployment schemes, and applications of infant cry classification over the last 24 years. The review focuses on the different infant cry classification techniques, feature extraction methods, and practical applications. Furthermore, we aimed to identify recent trends and directions in the field of infant cry signal processing to address both academic and practical needs.

Methods: A systematic literature review was conducted using 9 electronic databases: Cochrane Database of Systematic Reviews, JSTOR, Web of Science Core Collection, Scopus, PubMed, ACM, MEDLINE, IEEE Xplore, and Google Scholar. A total of 5904 search results were initially retrieved, with 126 studies meeting the eligibility criteria after screening by 2 independent reviewers. The methodological quality of these studies was assessed using the Cochrane risk of bias tool (version 2; RoB2), with 92% (n=116) of the studies indicating a low risk of bias and 8% (n=10) of the studies showing some concerns regarding bias. The overall quality assessment was performed using TRIPOD (Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis) guidelines. The data analysis was conducted using R (version 3.64; R Foundation).

Results: Notable advancements in infant cry classification methods were realized, particularly from 2019 onward, using machine learning, deep learning, and hybrid approaches. Common audio features included Mel-frequency cepstral coefficients, spectrograms, pitch, duration, intensity, formants, 0-crossing rate, and chroma. Deployment methods included mobile apps and web-based platforms for real-time analysis, with 90% (n=113) of the remaining models remaining undeployed to real-world applications. Denoising techniques and federated learning were limitedly used to enhance model robustness and ensure data confidentiality from 5% (n=6) of the studies. Some of the practical applications spanned health care monitoring, diagnostics, and caregiver support.

Conclusions: The evolution of infant cry classification methods has progressed from traditional classical statistical methods to machine learning models but with minimal considerations of data privacy, confidentiality, and ultimate deployment to practical use. Further research is thus proposed to develop standardized foundational audio multimodal approaches, incorporating a broader range of audio features and ensuring data confidentiality through methods such as federated learning. Furthermore, a preliminary layer is proposed for denoising the cry signal before the feature extraction stage. These improvements will enhance the accuracy, generalizability, and practical applicability of infant cry classification models in diverse health care settings.

查看原文本刊更多论文

婴儿哭声副语言分类研究进展综述：方法、实现和应用。

背景：有效的沟通是人类互动的必要条件，然而婴儿只能通过各种类型的暗示性哭声来表达他们的需求。解释婴儿哭声的传统方法往往是主观的，不一致的，并且在及时，精确的护理反应中留下空白。对婴儿哭声的精确解释可以潜在地为婴儿的健康、需求和福祉提供有价值的见解，从而使及时的医疗或护理行动成为可能。目的：本研究旨在系统回顾过去24年来婴儿哭声分类在方法、覆盖范围、部署方案和应用方面的进展。综述了婴儿哭声分类技术、特征提取方法及其在婴儿哭声分类中的应用。此外，我们旨在确定婴儿哭声信号处理领域的最新趋势和方向，以满足学术和实践需求。方法：采用Cochrane system Reviews Database、JSTOR、Web of Science Core Collection、Scopus、PubMed、ACM、MEDLINE、IEEE Xplore、谷歌Scholar等9个电子数据库进行系统文献综述。最初共检索到5904个检索结果，经两名独立审稿人筛选后，有126个研究符合资格标准。使用Cochrane风险偏倚工具第2版（RoB2）评估研究的方法学质量，92% （n=116）的研究显示低偏倚风险，8% （n=10）的研究显示存在偏倚问题。采用TRIPOD（透明报告个体预后或诊断的多变量预测模型）指南进行总体质量评估。使用R 3.64版本进行数据分析。结果：婴儿哭声分类方法取得了显著进展，特别是从2019年开始，采用机器学习、深度学习和混合方法。常见的音频特征包括Mel-frequency倒谱系数（MFCCs），谱图，音高，持续时间，强度，共振峰，过零率和色度。部署方法包括用于实时分析的移动应用程序和基于web的平台，其余90% （n=113）的模型仍未部署到现实世界的应用程序中。从5% （n=6）的研究中，去噪技术和联邦学习被有限地用于增强模型鲁棒性和确保数据机密性。一些实际应用涵盖了医疗保健监控、诊断和护理人员支持。结论：婴儿哭声分类方法的发展已经从传统的经典统计方法发展到机器学习模型，但很少考虑数据的隐私性、保密性，最终部署到实际应用。因此，建议进一步研究开发标准化的基础音频多模态方法，结合更广泛的音频特征，并通过联邦学习等方法确保数据保密性。此外，在特征提取阶段之前，提出了一个初始层用于对哭泣信号进行去噪。这些改进将提高婴儿哭声分类模型在不同医疗环境中的准确性、普遍性和实用性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊