Ewa Sieliwonczyk, Arunashis Sau, Konstantinos Patlatzoglou, Kathryn A McGurk, Libor Pastika, Prisca K Thami, Massimo Mangino, Sean L Zheng, George Powell, Lara Curran, Rachel J Buchan, Pantazis Theotokis, Nicholas S Peters, Bart Loeys, Daniel B Kramer, Jonathan W Waks, Fu Siong Ng, James S Ware
{"title":"Unsupervised feature extraction using deep learning empowers discovery of genetic determinants of the electrocardiogram.","authors":"Ewa Sieliwonczyk, Arunashis Sau, Konstantinos Patlatzoglou, Kathryn A McGurk, Libor Pastika, Prisca K Thami, Massimo Mangino, Sean L Zheng, George Powell, Lara Curran, Rachel J Buchan, Pantazis Theotokis, Nicholas S Peters, Bart Loeys, Daniel B Kramer, Jonathan W Waks, Fu Siong Ng, James S Ware","doi":"10.1186/s13073-025-01510-z","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Electrocardiograms (ECGs) are widely used to assess cardiac health, but traditional clinical interpretation relies on a limited set of human-defined parameters. While advanced data-driven methods can outperform analyses of conventional ECG features for some tasks, they often lack interpretability. Variational autoencoders (VAEs), a form of unsupervised machine learning, can address this limitation by extracting ECG features that are both comprehensive and interpretable, known as latent factors. These latent factors provide a low-dimensional representation optimised to capture the full informational content of the ECG. The aim of this study was to develop a deep learning model to learn these latent ECG features, and to use this optimised feature set in genetic analyses to identify fundamental determinants of cardiac electrical function. This approach has the potential to expand our understanding of cardiac electrophysiology by uncovering novel phenotypic and genetic relationships.</p><p><strong>Methods: </strong>Our novel VAE model was trained on a dataset comprising over one million secondary care median beat ECGs, with external validation in the UK Biobank (UKB). We performed common and rare variant association studies for VAE latent factors and conventional ECG traits on quality-controlled UKB data. Associated genetic variants were compared to loci for conventional ECG parameters available in the UKB and literature. Loci were considered novel if they were not previously associated with ECG traits in the GWAS Catalog and showed no known associations in nearby genes based on literature review. Novel GWAS associations were validated in a withheld subset of the UKB cohort. Additionally, we compared the associations of the VAE latent factors and conventional ECG traits with phenotypic traits, disease codes, and echocardiographic traits.</p><p><strong>Results: </strong>The VAE identified 20 independent latent factors that captured ECG morphology with high accuracy (mean Pearson correlation: 0.95). GWAS of latent factors identified 65 unique loci, including 27 novel regions not associated with conventional ECG parameters in the same dataset. Six novel loci were not associated with the ECG in previous larger GWAS studies, including genes implicated in cardiac function and remodelling. Rare variant analysis identified seven additional genes with links to cardiac electrophysiology and remodelling. Phenotypic analyses revealed stronger and more comprehensive associations for latent factors compared to conventional traits, particularly for echocardiographic measures and cardiac phenotypes. Visualisations of latent factor alterations highlighted the interpretability of this approach.</p><p><strong>Conclusions: </strong>Our study shows that the VAE provides a valuable tool for advancing our understanding of cardiac function and its genetic underpinnings, outperforming traditional approaches in genetic and phenotypic discovery.</p>","PeriodicalId":12645,"journal":{"name":"Genome Medicine","volume":"17 1","pages":"118"},"PeriodicalIF":10.4000,"publicationDate":"2025-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12512816/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genome Medicine","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s13073-025-01510-z","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Electrocardiograms (ECGs) are widely used to assess cardiac health, but traditional clinical interpretation relies on a limited set of human-defined parameters. While advanced data-driven methods can outperform analyses of conventional ECG features for some tasks, they often lack interpretability. Variational autoencoders (VAEs), a form of unsupervised machine learning, can address this limitation by extracting ECG features that are both comprehensive and interpretable, known as latent factors. These latent factors provide a low-dimensional representation optimised to capture the full informational content of the ECG. The aim of this study was to develop a deep learning model to learn these latent ECG features, and to use this optimised feature set in genetic analyses to identify fundamental determinants of cardiac electrical function. This approach has the potential to expand our understanding of cardiac electrophysiology by uncovering novel phenotypic and genetic relationships.
Methods: Our novel VAE model was trained on a dataset comprising over one million secondary care median beat ECGs, with external validation in the UK Biobank (UKB). We performed common and rare variant association studies for VAE latent factors and conventional ECG traits on quality-controlled UKB data. Associated genetic variants were compared to loci for conventional ECG parameters available in the UKB and literature. Loci were considered novel if they were not previously associated with ECG traits in the GWAS Catalog and showed no known associations in nearby genes based on literature review. Novel GWAS associations were validated in a withheld subset of the UKB cohort. Additionally, we compared the associations of the VAE latent factors and conventional ECG traits with phenotypic traits, disease codes, and echocardiographic traits.
Results: The VAE identified 20 independent latent factors that captured ECG morphology with high accuracy (mean Pearson correlation: 0.95). GWAS of latent factors identified 65 unique loci, including 27 novel regions not associated with conventional ECG parameters in the same dataset. Six novel loci were not associated with the ECG in previous larger GWAS studies, including genes implicated in cardiac function and remodelling. Rare variant analysis identified seven additional genes with links to cardiac electrophysiology and remodelling. Phenotypic analyses revealed stronger and more comprehensive associations for latent factors compared to conventional traits, particularly for echocardiographic measures and cardiac phenotypes. Visualisations of latent factor alterations highlighted the interpretability of this approach.
Conclusions: Our study shows that the VAE provides a valuable tool for advancing our understanding of cardiac function and its genetic underpinnings, outperforming traditional approaches in genetic and phenotypic discovery.
期刊介绍:
Genome Medicine is an open access journal that publishes outstanding research applying genetics, genomics, and multi-omics to understand, diagnose, and treat disease. Bridging basic science and clinical research, it covers areas such as cancer genomics, immuno-oncology, immunogenomics, infectious disease, microbiome, neurogenomics, systems medicine, clinical genomics, gene therapies, precision medicine, and clinical trials. The journal publishes original research, methods, software, and reviews to serve authors and promote broad interest and importance in the field.