Comparative Study on the Generalization Ability of Machine Learning and Deep Learning Algorithms for Quality Assessment of Wearable PPG Recordings

IF 3.6 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Access Pub Date : 2025-09-01 DOI:10.1109/ACCESS.2025.3604652

Santiago Mula Muñoz;Roberto Zangróniz;Óscar Ayo-Martín;José Joaquín Rieta;Raúl Alcaraz

{"title":"Comparative Study on the Generalization Ability of Machine Learning and Deep Learning Algorithms for Quality Assessment of Wearable PPG Recordings","authors":"Santiago Mula Muñoz;Roberto Zangróniz;Óscar Ayo-Martín;José Joaquín Rieta;Raúl Alcaraz","doi":"10.1109/ACCESS.2025.3604652","DOIUrl":null,"url":null,"abstract":"One of the major challenges in using photoplethysmography (PPG) sensors for heart rate monitoring in real-world settings is ensuring signal quality. This work evaluates and compares quality assessment methods using generic machine learning (ML) and deep learning (DL) pipelines, on a unique and comprehensive framework that includes different sensors, wavelengths, measurement locations, and recording environments. The PPG signals from one proprietary and five publicly available datasets were labeled in terms of quality by comparing the PPG-derived heart rate to a reference heart rate estimated from simultaneous electrocardiograms. Diverse techniques based on common ML classifiers and one- and two-dimensional convolutional neural networks (CNN) were trained on a dataset and tested on the remaining ones. The results showed that several generated models performed comparably to previous studies when they were tested on datasets with similar measurement positions and sensors to the training database. Specifically, reductions in sensitivity, specificity, and F1-score of less than 3% from training to testing were observed on some methods. Contrarily, they reported a notably poorer performance when tested on datasets presenting conditions different from the training. Even the best-performing model, based on the well-known, pre-trained CNN AlexNet, experienced a performance drop of over 20% in that situation. These findings show that the analyzed ML and DL methods lack the ability to generalize across PPG signals captured from diverse environments, sensors, wavelengths, and measurement locations. This suggests that developing case-specific methods might be the shortest path towards reliable PPG quality assessment.","PeriodicalId":13079,"journal":{"name":"IEEE Access","volume":"13 ","pages":"154031-154045"},"PeriodicalIF":3.6000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11145778","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Access","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11145778/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

One of the major challenges in using photoplethysmography (PPG) sensors for heart rate monitoring in real-world settings is ensuring signal quality. This work evaluates and compares quality assessment methods using generic machine learning (ML) and deep learning (DL) pipelines, on a unique and comprehensive framework that includes different sensors, wavelengths, measurement locations, and recording environments. The PPG signals from one proprietary and five publicly available datasets were labeled in terms of quality by comparing the PPG-derived heart rate to a reference heart rate estimated from simultaneous electrocardiograms. Diverse techniques based on common ML classifiers and one- and two-dimensional convolutional neural networks (CNN) were trained on a dataset and tested on the remaining ones. The results showed that several generated models performed comparably to previous studies when they were tested on datasets with similar measurement positions and sensors to the training database. Specifically, reductions in sensitivity, specificity, and F1-score of less than 3% from training to testing were observed on some methods. Contrarily, they reported a notably poorer performance when tested on datasets presenting conditions different from the training. Even the best-performing model, based on the well-known, pre-trained CNN AlexNet, experienced a performance drop of over 20% in that situation. These findings show that the analyzed ML and DL methods lack the ability to generalize across PPG signals captured from diverse environments, sensors, wavelengths, and measurement locations. This suggests that developing case-specific methods might be the shortest path towards reliable PPG quality assessment.

查看原文本刊更多论文

机器学习与深度学习算法在可穿戴PPG录音质量评估中的泛化能力比较研究

在现实环境中使用光电容积脉搏波（PPG）传感器进行心率监测的主要挑战之一是确保信号质量。这项工作评估和比较了使用通用机器学习（ML）和深度学习（DL）管道的质量评估方法，在一个独特而全面的框架上，包括不同的传感器、波长、测量位置和记录环境。从一个专有数据集和五个公开数据集获得的PPG信号通过将PPG衍生心率与同时心电图估计的参考心率进行比较来标记质量。基于常见ML分类器和一维和二维卷积神经网络（CNN）的各种技术在一个数据集上进行了训练，并在剩下的数据集上进行了测试。结果表明，当在具有与训练数据库相似的测量位置和传感器的数据集上进行测试时，生成的几个模型的表现与之前的研究相当。具体来说，从训练到测试，观察到某些方法的敏感性、特异性和f1评分降低低于3%。相反，当在与训练条件不同的数据集上进行测试时，他们的表现明显较差。即使是基于著名的预训练CNN AlexNet的表现最好的模型，在这种情况下也会出现超过20%的性能下降。这些发现表明，所分析的ML和DL方法缺乏泛化从不同环境、传感器、波长和测量位置捕获的PPG信号的能力。这表明，开发针对具体病例的方法可能是实现可靠的PPG质量评估的最短途径。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Access COMPUTER SCIENCE, INFORMATION SYSTEMSENGIN-ENGINEERING, ELECTRICAL & ELECTRONIC

CiteScore

9.80

自引率

7.70%

发文量

6673

审稿时长

6 weeks

期刊介绍： IEEE Access® is a multidisciplinary, open access (OA), applications-oriented, all-electronic archival journal that continuously presents the results of original research or development across all of IEEE''s fields of interest. IEEE Access will publish articles that are of high interest to readers, original, technically correct, and clearly presented. Supported by author publication charges (APC), its hallmarks are a rapid peer review and publication process with open access to all readers. Unlike IEEE''s traditional Transactions or Journals, reviews are "binary", in that reviewers will either Accept or Reject an article in the form it is submitted in order to achieve rapid turnaround. Especially encouraged are submissions on: Multidisciplinary topics, or applications-oriented articles and negative results that do not fit within the scope of IEEE''s traditional journals. Practical articles discussing new experiments or measurement techniques, interesting solutions to engineering. Development of new or improved fabrication or manufacturing techniques. Reviews or survey articles of new or evolving fields oriented to assist others in understanding the new area.