{"title":"A Systematic Review of Advances in Infant Cry Paralinguistic Classification: Methods, Implementation, and Applications.","authors":"Geofrey Owino, Bernard Bernard Shibwabo","doi":"10.2196/69457","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Effective communication is essential for human interaction, yet infants can only express their needs through various types of suggestive cries. Traditional approaches of interpreting infant cries are often subjective, inconsistent, and slow leaving gaps in timely, precise caregiving responses. A precise interpretation of infant cries can potentially provide valuable insights into the infant's health, needs, and well-being, enabling prompt medical or caregiving actions.</p><p><strong>Objective: </strong>This study seeks to systematically review the advancements in methods, coverage, deployment schemes, and applications of infant cry classification over the last 24 years. The review focuses on the different infant cry classification techniques, feature extraction methods, and the practical applications. Furthermore, we aimed to identify recent trends and directions in the field of infant cry signal processing to address both academic and practical needs.</p><p><strong>Methods: </strong>A systematic literature review was conducted by using nine electronic databases: Cochrane Database of Systematic Reviews, JSTOR, Web of Science Core Collection, Scopus, PubMed, ACM, MEDLINE, IEEE Xplore, and Google Scholar. A total of 5904 search results were initially retrieved, with 126 studies meeting the eligibility criteria after screening by two independent reviewers. The methodological quality of the studies was assessed using the Cochrane risk-of-bias tool version 2 (RoB2), with 92% (n=116) of the studies indicating a low risk of bias and 8% (n=10) of the studies showing some concerns regarding bias. The overall quality assessment was performed using the TRIPOD (Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis) guidelines. The data analysis was conducted using R version 3.64.</p><p><strong>Results: </strong>Notable advancements in infant cry classification methods were realized, particularly from 2019 onwards employing machine learning, deep learning, and hybrid approaches. Common audio features included Mel-frequency cepstral coefficients (MFCCs), spectrograms, pitch, duration, intensity, formants, zero-crossing rate and chroma. Deployment methods included mobile applications and web-based platforms for real-time analysis with 90% (n=113) of the remaining models remained undeployed to real world applications. Denoising techniques and federated learning were limitedly employed to enhance model robustness and ensure data confidentiality from 5% (n=6) of the studies. Some of the practical applications spanned healthcare monitoring, diagnostics, and caregiver support.</p><p><strong>Conclusions: </strong>The evolution of infant cry classification methods has progressed from traditional classical statistical methods to machine learning models but with minimal considerations of data privacy, confidentiality, and ultimate deployment to the practical use. Further research is thus proposed to develop standardized foundational audio multimodal approaches, incorporating a broader range of audio features and ensuring data confidentiality through methods such as federated learning. Furthermore, a preliminary layer is proposed for denoising the cry signal before the feature extractions stage. These improvements will enhance the accuracy, generalizability, and practical applicability of infant cry classification models in diverse healthcare settings.</p>","PeriodicalId":36224,"journal":{"name":"JMIR Rehabilitation and Assistive Technologies","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Rehabilitation and Assistive Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2196/69457","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Effective communication is essential for human interaction, yet infants can only express their needs through various types of suggestive cries. Traditional approaches of interpreting infant cries are often subjective, inconsistent, and slow leaving gaps in timely, precise caregiving responses. A precise interpretation of infant cries can potentially provide valuable insights into the infant's health, needs, and well-being, enabling prompt medical or caregiving actions.
Objective: This study seeks to systematically review the advancements in methods, coverage, deployment schemes, and applications of infant cry classification over the last 24 years. The review focuses on the different infant cry classification techniques, feature extraction methods, and the practical applications. Furthermore, we aimed to identify recent trends and directions in the field of infant cry signal processing to address both academic and practical needs.
Methods: A systematic literature review was conducted by using nine electronic databases: Cochrane Database of Systematic Reviews, JSTOR, Web of Science Core Collection, Scopus, PubMed, ACM, MEDLINE, IEEE Xplore, and Google Scholar. A total of 5904 search results were initially retrieved, with 126 studies meeting the eligibility criteria after screening by two independent reviewers. The methodological quality of the studies was assessed using the Cochrane risk-of-bias tool version 2 (RoB2), with 92% (n=116) of the studies indicating a low risk of bias and 8% (n=10) of the studies showing some concerns regarding bias. The overall quality assessment was performed using the TRIPOD (Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis) guidelines. The data analysis was conducted using R version 3.64.
Results: Notable advancements in infant cry classification methods were realized, particularly from 2019 onwards employing machine learning, deep learning, and hybrid approaches. Common audio features included Mel-frequency cepstral coefficients (MFCCs), spectrograms, pitch, duration, intensity, formants, zero-crossing rate and chroma. Deployment methods included mobile applications and web-based platforms for real-time analysis with 90% (n=113) of the remaining models remained undeployed to real world applications. Denoising techniques and federated learning were limitedly employed to enhance model robustness and ensure data confidentiality from 5% (n=6) of the studies. Some of the practical applications spanned healthcare monitoring, diagnostics, and caregiver support.
Conclusions: The evolution of infant cry classification methods has progressed from traditional classical statistical methods to machine learning models but with minimal considerations of data privacy, confidentiality, and ultimate deployment to the practical use. Further research is thus proposed to develop standardized foundational audio multimodal approaches, incorporating a broader range of audio features and ensuring data confidentiality through methods such as federated learning. Furthermore, a preliminary layer is proposed for denoising the cry signal before the feature extractions stage. These improvements will enhance the accuracy, generalizability, and practical applicability of infant cry classification models in diverse healthcare settings.