Yongzhuo Jiang , Hongbo Yang , Tao Guo , Hong Deng , Weilian Wang
{"title":"PCSCN: A quality assessment model for large-scale phonocardiogram data cleaning and enhancement","authors":"Yongzhuo Jiang , Hongbo Yang , Tao Guo , Hong Deng , Weilian Wang","doi":"10.1016/j.cmpb.2025.108977","DOIUrl":null,"url":null,"abstract":"<div><h3>Background:</h3><div>The acquisition of large-scale phonocardiogram (PCG) data is crucial for clinical research and has significantly advanced the application of data-driven heart sound classification models. Owing to the complexity of clinical environments, the quality of collected PCG data cannot be guaranteed. The exclusion of low-quality signals is essential for conducting a reliable PCG analysis.</div></div><div><h3>Methods:</h3><div>This study introduces a novel quality assessment model, the Parallel Channel Sequence Convolutional Network (PCSCN). The PCSCN automatically and accurately detects and removes low-quality PCG signals, thereby improving dataset reliability and usability. Unlike previous methods that rely on manually extracting numerous statistical features, the PCSCN employs low-complexity features and a multi-channel sequence architecture, offering greater accuracy and efficiency. In addition, this study applies the PCSCN in a controlled experiment for database cleaning and enhancement to verify the impact of this process on downstream data-driven PCG classification models.</div></div><div><h3>Results:</h3><div>When tested on a public dataset, PCSCN achieved an accuracy of 95.45% and an F1-score of 95.44%. In a database cleaning task involving PCG data from 7220 subjects, PCSCN completed the task in 515.76s. Furthermore, in the controlled experiment, the PCG classification model trained with PCSCN-enhanced data demonstrated enhanced performance across multiple metrics.</div></div><div><h3>Conclusions:</h3><div>The PCSCN is both reliable and efficient, and is poised to play a pivotal role in the cleaning of large-scale PCG databases. By enhancing and improving the quality of the training data, the PCSCN significantly strengthens the clinical decision-making capabilities of PCG classification models, thereby elevating their value for clinical research and application.</div></div>","PeriodicalId":10624,"journal":{"name":"Computer methods and programs in biomedicine","volume":"271 ","pages":"Article 108977"},"PeriodicalIF":4.8000,"publicationDate":"2025-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer methods and programs in biomedicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169260725003943","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
Background:
The acquisition of large-scale phonocardiogram (PCG) data is crucial for clinical research and has significantly advanced the application of data-driven heart sound classification models. Owing to the complexity of clinical environments, the quality of collected PCG data cannot be guaranteed. The exclusion of low-quality signals is essential for conducting a reliable PCG analysis.
Methods:
This study introduces a novel quality assessment model, the Parallel Channel Sequence Convolutional Network (PCSCN). The PCSCN automatically and accurately detects and removes low-quality PCG signals, thereby improving dataset reliability and usability. Unlike previous methods that rely on manually extracting numerous statistical features, the PCSCN employs low-complexity features and a multi-channel sequence architecture, offering greater accuracy and efficiency. In addition, this study applies the PCSCN in a controlled experiment for database cleaning and enhancement to verify the impact of this process on downstream data-driven PCG classification models.
Results:
When tested on a public dataset, PCSCN achieved an accuracy of 95.45% and an F1-score of 95.44%. In a database cleaning task involving PCG data from 7220 subjects, PCSCN completed the task in 515.76s. Furthermore, in the controlled experiment, the PCG classification model trained with PCSCN-enhanced data demonstrated enhanced performance across multiple metrics.
Conclusions:
The PCSCN is both reliable and efficient, and is poised to play a pivotal role in the cleaning of large-scale PCG databases. By enhancing and improving the quality of the training data, the PCSCN significantly strengthens the clinical decision-making capabilities of PCG classification models, thereby elevating their value for clinical research and application.
期刊介绍:
To encourage the development of formal computing methods, and their application in biomedical research and medical practice, by illustration of fundamental principles in biomedical informatics research; to stimulate basic research into application software design; to report the state of research of biomedical information processing projects; to report new computer methodologies applied in biomedical areas; the eventual distribution of demonstrable software to avoid duplication of effort; to provide a forum for discussion and improvement of existing software; to optimize contact between national organizations and regional user groups by promoting an international exchange of information on formal methods, standards and software in biomedicine.
Computer Methods and Programs in Biomedicine covers computing methodology and software systems derived from computing science for implementation in all aspects of biomedical research and medical practice. It is designed to serve: biochemists; biologists; geneticists; immunologists; neuroscientists; pharmacologists; toxicologists; clinicians; epidemiologists; psychiatrists; psychologists; cardiologists; chemists; (radio)physicists; computer scientists; programmers and systems analysts; biomedical, clinical, electrical and other engineers; teachers of medical informatics and users of educational software.