Christoph Hintermüller, Michael Hirnschrodt, Hermann Blessberger, Clemens Steinwender
{"title":"ECG Beat classification: Impact of linear dependent samples","authors":"Christoph Hintermüller, Michael Hirnschrodt, Hermann Blessberger, Clemens Steinwender","doi":"10.1515/cdbme-2023-1207","DOIUrl":null,"url":null,"abstract":"Abstract The Electro Cardio Gram (ECG) is a very valuable clinical tool to access the electric function of the heart. It provides insight into the different phases of the heart beat and various kinds of disorders which may affect them. In literature the impact of linear dependency between feature signals upon the classification outcome and how to reduce it have been largely investigated and discussed. This study puts a focus upon linear dependency between samples of imbalanced data sets, its relation to the observed over fitting with respect to majority classes and hot to reduce it. A set of 58 feature signals is used to train a several LDA classifier either discriminating 3 classes (Normal, Artefact, Arrhythmic) or 5 Classes (Normal, Artefact, Atrial and ventricular premature contractions and bundle branch blocks). The training data set is preprocessed using four sample reduction approaches and a nearest neighbour clustering method. In the case of 5 classes accuracies of 96.82% in the imbalanced case and 97.44% for the data preprocessed with the QR or SVD methods were obtained. For 3 classes curacies of 97.68% and 98.12% were achieved. With the nearest neighbour clustering method only accuracies of 96.00% for 5 classes and 97.37% for 3 classes could be achieved. The results clearly show that imbalanced ECG data does contain linear dependent samples. These cause a bias towards majority class which will be over fitted by the classifier. Sample reduction methods and algorithms which are not aware of the presence linear dependent samples like the nearest neighbour clustering approach even further increase this bias ore even worse destroy relevant information by merging samples which encode distinct aspects of the beat class, destroying relevant information.","PeriodicalId":10739,"journal":{"name":"Current Directions in Biomedical Engineering","volume":" 17","pages":"23 - 26"},"PeriodicalIF":0.0000,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Current Directions in Biomedical Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1515/cdbme-2023-1207","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Engineering","Score":null,"Total":0}
引用次数: 0
Abstract
Abstract The Electro Cardio Gram (ECG) is a very valuable clinical tool to access the electric function of the heart. It provides insight into the different phases of the heart beat and various kinds of disorders which may affect them. In literature the impact of linear dependency between feature signals upon the classification outcome and how to reduce it have been largely investigated and discussed. This study puts a focus upon linear dependency between samples of imbalanced data sets, its relation to the observed over fitting with respect to majority classes and hot to reduce it. A set of 58 feature signals is used to train a several LDA classifier either discriminating 3 classes (Normal, Artefact, Arrhythmic) or 5 Classes (Normal, Artefact, Atrial and ventricular premature contractions and bundle branch blocks). The training data set is preprocessed using four sample reduction approaches and a nearest neighbour clustering method. In the case of 5 classes accuracies of 96.82% in the imbalanced case and 97.44% for the data preprocessed with the QR or SVD methods were obtained. For 3 classes curacies of 97.68% and 98.12% were achieved. With the nearest neighbour clustering method only accuracies of 96.00% for 5 classes and 97.37% for 3 classes could be achieved. The results clearly show that imbalanced ECG data does contain linear dependent samples. These cause a bias towards majority class which will be over fitted by the classifier. Sample reduction methods and algorithms which are not aware of the presence linear dependent samples like the nearest neighbour clustering approach even further increase this bias ore even worse destroy relevant information by merging samples which encode distinct aspects of the beat class, destroying relevant information.