{"title":"Advances in Infant Cry Paralinguistic Classification-Methods, Implementation, and Applications: Systematic Review.","authors":"Geofrey Owino, Bernard Shibwabo","doi":"10.2196/69457","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Effective communication is essential for human interaction; yet, infants can only express their needs through various types of suggestive cries. Traditional approaches of interpreting infant cries are often subjective, inconsistent, and slow, leaving gaps in timely, precise caregiving responses. A precise interpretation of infant cries can potentially provide valuable insights into the infant's health, needs, and well-being, enabling prompt medical or caregiving actions.</p><p><strong>Objective: </strong>This study seeks to systematically review the advancements in methods, coverage, deployment schemes, and applications of infant cry classification over the last 24 years. The review focuses on the different infant cry classification techniques, feature extraction methods, and practical applications. Furthermore, we aimed to identify recent trends and directions in the field of infant cry signal processing to address both academic and practical needs.</p><p><strong>Methods: </strong>A systematic literature review was conducted using 9 electronic databases: Cochrane Database of Systematic Reviews, JSTOR, Web of Science Core Collection, Scopus, PubMed, ACM, MEDLINE, IEEE Xplore, and Google Scholar. A total of 5904 search results were initially retrieved, with 126 studies meeting the eligibility criteria after screening by 2 independent reviewers. The methodological quality of these studies was assessed using the Cochrane risk of bias tool (version 2; RoB2), with 92% (n=116) of the studies indicating a low risk of bias and 8% (n=10) of the studies showing some concerns regarding bias. The overall quality assessment was performed using TRIPOD (Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis) guidelines. The data analysis was conducted using R (version 3.64; R Foundation).</p><p><strong>Results: </strong>Notable advancements in infant cry classification methods were realized, particularly from 2019 onward, using machine learning, deep learning, and hybrid approaches. Common audio features included Mel-frequency cepstral coefficients, spectrograms, pitch, duration, intensity, formants, 0-crossing rate, and chroma. Deployment methods included mobile apps and web-based platforms for real-time analysis, with 90% (n=113) of the remaining models remaining undeployed to real-world applications. Denoising techniques and federated learning were limitedly used to enhance model robustness and ensure data confidentiality from 5% (n=6) of the studies. Some of the practical applications spanned health care monitoring, diagnostics, and caregiver support.</p><p><strong>Conclusions: </strong>The evolution of infant cry classification methods has progressed from traditional classical statistical methods to machine learning models but with minimal considerations of data privacy, confidentiality, and ultimate deployment to practical use. Further research is thus proposed to develop standardized foundational audio multimodal approaches, incorporating a broader range of audio features and ensuring data confidentiality through methods such as federated learning. Furthermore, a preliminary layer is proposed for denoising the cry signal before the feature extraction stage. These improvements will enhance the accuracy, generalizability, and practical applicability of infant cry classification models in diverse health care settings.</p>","PeriodicalId":36224,"journal":{"name":"JMIR Rehabilitation and Assistive Technologies","volume":" ","pages":"e69457"},"PeriodicalIF":0.0000,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12076029/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Rehabilitation and Assistive Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2196/69457","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Effective communication is essential for human interaction; yet, infants can only express their needs through various types of suggestive cries. Traditional approaches of interpreting infant cries are often subjective, inconsistent, and slow, leaving gaps in timely, precise caregiving responses. A precise interpretation of infant cries can potentially provide valuable insights into the infant's health, needs, and well-being, enabling prompt medical or caregiving actions.
Objective: This study seeks to systematically review the advancements in methods, coverage, deployment schemes, and applications of infant cry classification over the last 24 years. The review focuses on the different infant cry classification techniques, feature extraction methods, and practical applications. Furthermore, we aimed to identify recent trends and directions in the field of infant cry signal processing to address both academic and practical needs.
Methods: A systematic literature review was conducted using 9 electronic databases: Cochrane Database of Systematic Reviews, JSTOR, Web of Science Core Collection, Scopus, PubMed, ACM, MEDLINE, IEEE Xplore, and Google Scholar. A total of 5904 search results were initially retrieved, with 126 studies meeting the eligibility criteria after screening by 2 independent reviewers. The methodological quality of these studies was assessed using the Cochrane risk of bias tool (version 2; RoB2), with 92% (n=116) of the studies indicating a low risk of bias and 8% (n=10) of the studies showing some concerns regarding bias. The overall quality assessment was performed using TRIPOD (Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis) guidelines. The data analysis was conducted using R (version 3.64; R Foundation).
Results: Notable advancements in infant cry classification methods were realized, particularly from 2019 onward, using machine learning, deep learning, and hybrid approaches. Common audio features included Mel-frequency cepstral coefficients, spectrograms, pitch, duration, intensity, formants, 0-crossing rate, and chroma. Deployment methods included mobile apps and web-based platforms for real-time analysis, with 90% (n=113) of the remaining models remaining undeployed to real-world applications. Denoising techniques and federated learning were limitedly used to enhance model robustness and ensure data confidentiality from 5% (n=6) of the studies. Some of the practical applications spanned health care monitoring, diagnostics, and caregiver support.
Conclusions: The evolution of infant cry classification methods has progressed from traditional classical statistical methods to machine learning models but with minimal considerations of data privacy, confidentiality, and ultimate deployment to practical use. Further research is thus proposed to develop standardized foundational audio multimodal approaches, incorporating a broader range of audio features and ensuring data confidentiality through methods such as federated learning. Furthermore, a preliminary layer is proposed for denoising the cry signal before the feature extraction stage. These improvements will enhance the accuracy, generalizability, and practical applicability of infant cry classification models in diverse health care settings.