使用深度学习的无监督特征提取可以发现心电图的遗传决定因素。

IF 10.4 1区 生物学 Q1 GENETICS & HEREDITY
Ewa Sieliwonczyk, Arunashis Sau, Konstantinos Patlatzoglou, Kathryn A McGurk, Libor Pastika, Prisca K Thami, Massimo Mangino, Sean L Zheng, George Powell, Lara Curran, Rachel J Buchan, Pantazis Theotokis, Nicholas S Peters, Bart Loeys, Daniel B Kramer, Jonathan W Waks, Fu Siong Ng, James S Ware
{"title":"使用深度学习的无监督特征提取可以发现心电图的遗传决定因素。","authors":"Ewa Sieliwonczyk, Arunashis Sau, Konstantinos Patlatzoglou, Kathryn A McGurk, Libor Pastika, Prisca K Thami, Massimo Mangino, Sean L Zheng, George Powell, Lara Curran, Rachel J Buchan, Pantazis Theotokis, Nicholas S Peters, Bart Loeys, Daniel B Kramer, Jonathan W Waks, Fu Siong Ng, James S Ware","doi":"10.1186/s13073-025-01510-z","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Electrocardiograms (ECGs) are widely used to assess cardiac health, but traditional clinical interpretation relies on a limited set of human-defined parameters. While advanced data-driven methods can outperform analyses of conventional ECG features for some tasks, they often lack interpretability. Variational autoencoders (VAEs), a form of unsupervised machine learning, can address this limitation by extracting ECG features that are both comprehensive and interpretable, known as latent factors. These latent factors provide a low-dimensional representation optimised to capture the full informational content of the ECG. The aim of this study was to develop a deep learning model to learn these latent ECG features, and to use this optimised feature set in genetic analyses to identify fundamental determinants of cardiac electrical function. This approach has the potential to expand our understanding of cardiac electrophysiology by uncovering novel phenotypic and genetic relationships.</p><p><strong>Methods: </strong>Our novel VAE model was trained on a dataset comprising over one million secondary care median beat ECGs, with external validation in the UK Biobank (UKB). We performed common and rare variant association studies for VAE latent factors and conventional ECG traits on quality-controlled UKB data. Associated genetic variants were compared to loci for conventional ECG parameters available in the UKB and literature. Loci were considered novel if they were not previously associated with ECG traits in the GWAS Catalog and showed no known associations in nearby genes based on literature review. Novel GWAS associations were validated in a withheld subset of the UKB cohort. Additionally, we compared the associations of the VAE latent factors and conventional ECG traits with phenotypic traits, disease codes, and echocardiographic traits.</p><p><strong>Results: </strong>The VAE identified 20 independent latent factors that captured ECG morphology with high accuracy (mean Pearson correlation: 0.95). GWAS of latent factors identified 65 unique loci, including 27 novel regions not associated with conventional ECG parameters in the same dataset. Six novel loci were not associated with the ECG in previous larger GWAS studies, including genes implicated in cardiac function and remodelling. Rare variant analysis identified seven additional genes with links to cardiac electrophysiology and remodelling. Phenotypic analyses revealed stronger and more comprehensive associations for latent factors compared to conventional traits, particularly for echocardiographic measures and cardiac phenotypes. Visualisations of latent factor alterations highlighted the interpretability of this approach.</p><p><strong>Conclusions: </strong>Our study shows that the VAE provides a valuable tool for advancing our understanding of cardiac function and its genetic underpinnings, outperforming traditional approaches in genetic and phenotypic discovery.</p>","PeriodicalId":12645,"journal":{"name":"Genome Medicine","volume":"17 1","pages":"118"},"PeriodicalIF":10.4000,"publicationDate":"2025-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12512816/pdf/","citationCount":"0","resultStr":"{\"title\":\"Unsupervised feature extraction using deep learning empowers discovery of genetic determinants of the electrocardiogram.\",\"authors\":\"Ewa Sieliwonczyk, Arunashis Sau, Konstantinos Patlatzoglou, Kathryn A McGurk, Libor Pastika, Prisca K Thami, Massimo Mangino, Sean L Zheng, George Powell, Lara Curran, Rachel J Buchan, Pantazis Theotokis, Nicholas S Peters, Bart Loeys, Daniel B Kramer, Jonathan W Waks, Fu Siong Ng, James S Ware\",\"doi\":\"10.1186/s13073-025-01510-z\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Electrocardiograms (ECGs) are widely used to assess cardiac health, but traditional clinical interpretation relies on a limited set of human-defined parameters. While advanced data-driven methods can outperform analyses of conventional ECG features for some tasks, they often lack interpretability. Variational autoencoders (VAEs), a form of unsupervised machine learning, can address this limitation by extracting ECG features that are both comprehensive and interpretable, known as latent factors. These latent factors provide a low-dimensional representation optimised to capture the full informational content of the ECG. The aim of this study was to develop a deep learning model to learn these latent ECG features, and to use this optimised feature set in genetic analyses to identify fundamental determinants of cardiac electrical function. This approach has the potential to expand our understanding of cardiac electrophysiology by uncovering novel phenotypic and genetic relationships.</p><p><strong>Methods: </strong>Our novel VAE model was trained on a dataset comprising over one million secondary care median beat ECGs, with external validation in the UK Biobank (UKB). We performed common and rare variant association studies for VAE latent factors and conventional ECG traits on quality-controlled UKB data. Associated genetic variants were compared to loci for conventional ECG parameters available in the UKB and literature. Loci were considered novel if they were not previously associated with ECG traits in the GWAS Catalog and showed no known associations in nearby genes based on literature review. Novel GWAS associations were validated in a withheld subset of the UKB cohort. Additionally, we compared the associations of the VAE latent factors and conventional ECG traits with phenotypic traits, disease codes, and echocardiographic traits.</p><p><strong>Results: </strong>The VAE identified 20 independent latent factors that captured ECG morphology with high accuracy (mean Pearson correlation: 0.95). GWAS of latent factors identified 65 unique loci, including 27 novel regions not associated with conventional ECG parameters in the same dataset. Six novel loci were not associated with the ECG in previous larger GWAS studies, including genes implicated in cardiac function and remodelling. Rare variant analysis identified seven additional genes with links to cardiac electrophysiology and remodelling. Phenotypic analyses revealed stronger and more comprehensive associations for latent factors compared to conventional traits, particularly for echocardiographic measures and cardiac phenotypes. Visualisations of latent factor alterations highlighted the interpretability of this approach.</p><p><strong>Conclusions: </strong>Our study shows that the VAE provides a valuable tool for advancing our understanding of cardiac function and its genetic underpinnings, outperforming traditional approaches in genetic and phenotypic discovery.</p>\",\"PeriodicalId\":12645,\"journal\":{\"name\":\"Genome Medicine\",\"volume\":\"17 1\",\"pages\":\"118\"},\"PeriodicalIF\":10.4000,\"publicationDate\":\"2025-10-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12512816/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Genome Medicine\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1186/s13073-025-01510-z\",\"RegionNum\":1,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"GENETICS & HEREDITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genome Medicine","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s13073-025-01510-z","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0

摘要

背景:心电图(ECGs)被广泛用于评估心脏健康,但传统的临床解释依赖于一组有限的人为定义的参数。虽然先进的数据驱动方法可以在某些任务中优于传统的心电特征分析,但它们往往缺乏可解释性。变分自编码器(VAEs)是一种无监督机器学习的形式,可以通过提取全面且可解释的ECG特征(称为潜在因素)来解决这一限制。这些潜在因素提供了优化的低维表示,以捕获ECG的全部信息内容。本研究的目的是开发一个深度学习模型来学习这些潜在的ECG特征,并在遗传分析中使用这个优化的特征集来确定心电功能的基本决定因素。这种方法有可能通过揭示新的表型和遗传关系来扩大我们对心脏电生理学的理解。方法:我们的新VAE模型是在一个包含超过100万个二级护理中位数心跳心电图的数据集上进行训练的,并在英国生物银行(UKB)进行了外部验证。我们对质量控制的UKB数据进行了VAE潜在因素和常规ECG特征的常见和罕见变异关联研究。将相关遗传变异与UKB和文献中提供的常规心电图参数的基因座进行比较。根据文献综述,如果位点以前没有与GWAS目录中的ECG特征相关,并且在附近基因中没有显示已知的关联,则认为它们是新颖的。在UKB队列的一个保留子集中验证了新的GWAS关联。此外,我们比较了VAE潜在因素和常规心电图特征与表型特征、疾病编码和超声心动图特征的关系。结果:VAE识别出20个独立的潜在因素,准确捕获心电图形态(平均Pearson相关系数:0.95)。潜在因素的GWAS鉴定出65个独特的基因座,其中包括27个与同一数据集中传统ECG参数无关的新区域。在之前的大型GWAS研究中,有6个新的基因座与ECG无关,包括与心功能和重构有关的基因。罕见变异分析确定了另外7个与心脏电生理和重构有关的基因。表型分析显示,与传统特征相比,潜在因素的关联更强,更全面,特别是超声心动图测量和心脏表型。潜在因素变化的可视化强调了这种方法的可解释性。结论:我们的研究表明,VAE为提高我们对心脏功能及其遗传基础的理解提供了一个有价值的工具,在遗传和表型发现方面优于传统方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Unsupervised feature extraction using deep learning empowers discovery of genetic determinants of the electrocardiogram.

Unsupervised feature extraction using deep learning empowers discovery of genetic determinants of the electrocardiogram.

Unsupervised feature extraction using deep learning empowers discovery of genetic determinants of the electrocardiogram.

Unsupervised feature extraction using deep learning empowers discovery of genetic determinants of the electrocardiogram.

Background: Electrocardiograms (ECGs) are widely used to assess cardiac health, but traditional clinical interpretation relies on a limited set of human-defined parameters. While advanced data-driven methods can outperform analyses of conventional ECG features for some tasks, they often lack interpretability. Variational autoencoders (VAEs), a form of unsupervised machine learning, can address this limitation by extracting ECG features that are both comprehensive and interpretable, known as latent factors. These latent factors provide a low-dimensional representation optimised to capture the full informational content of the ECG. The aim of this study was to develop a deep learning model to learn these latent ECG features, and to use this optimised feature set in genetic analyses to identify fundamental determinants of cardiac electrical function. This approach has the potential to expand our understanding of cardiac electrophysiology by uncovering novel phenotypic and genetic relationships.

Methods: Our novel VAE model was trained on a dataset comprising over one million secondary care median beat ECGs, with external validation in the UK Biobank (UKB). We performed common and rare variant association studies for VAE latent factors and conventional ECG traits on quality-controlled UKB data. Associated genetic variants were compared to loci for conventional ECG parameters available in the UKB and literature. Loci were considered novel if they were not previously associated with ECG traits in the GWAS Catalog and showed no known associations in nearby genes based on literature review. Novel GWAS associations were validated in a withheld subset of the UKB cohort. Additionally, we compared the associations of the VAE latent factors and conventional ECG traits with phenotypic traits, disease codes, and echocardiographic traits.

Results: The VAE identified 20 independent latent factors that captured ECG morphology with high accuracy (mean Pearson correlation: 0.95). GWAS of latent factors identified 65 unique loci, including 27 novel regions not associated with conventional ECG parameters in the same dataset. Six novel loci were not associated with the ECG in previous larger GWAS studies, including genes implicated in cardiac function and remodelling. Rare variant analysis identified seven additional genes with links to cardiac electrophysiology and remodelling. Phenotypic analyses revealed stronger and more comprehensive associations for latent factors compared to conventional traits, particularly for echocardiographic measures and cardiac phenotypes. Visualisations of latent factor alterations highlighted the interpretability of this approach.

Conclusions: Our study shows that the VAE provides a valuable tool for advancing our understanding of cardiac function and its genetic underpinnings, outperforming traditional approaches in genetic and phenotypic discovery.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Genome Medicine
Genome Medicine GENETICS & HEREDITY-
CiteScore
20.80
自引率
0.80%
发文量
128
审稿时长
6-12 weeks
期刊介绍: Genome Medicine is an open access journal that publishes outstanding research applying genetics, genomics, and multi-omics to understand, diagnose, and treat disease. Bridging basic science and clinical research, it covers areas such as cancer genomics, immuno-oncology, immunogenomics, infectious disease, microbiome, neurogenomics, systems medicine, clinical genomics, gene therapies, precision medicine, and clinical trials. The journal publishes original research, methods, software, and reviews to serve authors and promote broad interest and importance in the field.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信