GenECG: a synthetic image-based ECG dataset to augment artificial intelligence-enhanced algorithm development.

IF 4.4 Q1 HEALTH CARE SCIENCES & SERVICES
Neil Bodagh, Kyaw Soe Tun, Adam Barton, Malihe Javidi, Darwon Rashid, Rachel Burns, Irum Kotadia, Magda Klis, Ali Gharaviri, Vinush Vigneswaran, Steven Niederer, Mark O'Neill, Miguel O Bernabeu, Steven E Williams
{"title":"GenECG: a synthetic image-based ECG dataset to augment artificial intelligence-enhanced algorithm development.","authors":"Neil Bodagh, Kyaw Soe Tun, Adam Barton, Malihe Javidi, Darwon Rashid, Rachel Burns, Irum Kotadia, Magda Klis, Ali Gharaviri, Vinush Vigneswaran, Steven Niederer, Mark O'Neill, Miguel O Bernabeu, Steven E Williams","doi":"10.1136/bmjhci-2024-101335","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>An image-based ECG dataset incorporating visual imperfections common to paper-based ECGs, which are typically scanned or photographed into electronic health records, could facilitate clinically useful artificial intelligence (AI)-ECG algorithm development. This study aimed to create a high-fidelity, synthetic image-based ECG dataset.</p><p><strong>Methods: </strong>ECG images were recreated from the PTB-XL database, a signal-based dataset and image manipulation techniques were applied to mimic imperfections associated with ECGs in real-world settings. Clinical Turing tests were conducted to evaluate the fidelity of the synthetic images, and the performance of current AI-ECG algorithms was assessed using synthetic images containing visual imperfections.</p><p><strong>Results: </strong>GenECG, an image-based dataset containing 21 799 ECGs with visual imperfections encountered in routine clinical care paired with imperfection-free images, was created. Turing tests confirmed the realism of the images: expert observer accuracy of discrimination between real-world and synthetic ECGs fell from 63.9% (95% CI 58.0% to 69.8%) to 53.3% (95% CI 48.6% to 58.1%) over three rounds of testing, indicating that observers could not distinguish between synthetic and real ECGs. The performance of pre-existing algorithms on synthetic (area under the curve (AUC) 0.592, 95% CI 0.421 to 0.763) and real-world (AUC 0.647, 95% CI 0.520 to 0.774) ECG images containing imperfections was limited. Algorithm fine-tuning with GenECG data improved real-world ECG classification accuracy (AUC 0.821, 95% CI 0.730 to 0.913) demonstrating its potential to augment image-based algorithm development.</p><p><strong>Discussion/conclusion: </strong>GenECG is the first synthetic image-based ECG dataset to pass a clinical Turing test. The dataset will enable image-based AI-ECG algorithm development, ensuring utility in low resource areas, prehospital settings and hospital environments where signal data are unavailable.</p>","PeriodicalId":9050,"journal":{"name":"BMJ Health & Care Informatics","volume":"32 1","pages":""},"PeriodicalIF":4.4000,"publicationDate":"2025-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12142132/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMJ Health & Care Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1136/bmjhci-2024-101335","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0

Abstract

Objectives: An image-based ECG dataset incorporating visual imperfections common to paper-based ECGs, which are typically scanned or photographed into electronic health records, could facilitate clinically useful artificial intelligence (AI)-ECG algorithm development. This study aimed to create a high-fidelity, synthetic image-based ECG dataset.

Methods: ECG images were recreated from the PTB-XL database, a signal-based dataset and image manipulation techniques were applied to mimic imperfections associated with ECGs in real-world settings. Clinical Turing tests were conducted to evaluate the fidelity of the synthetic images, and the performance of current AI-ECG algorithms was assessed using synthetic images containing visual imperfections.

Results: GenECG, an image-based dataset containing 21 799 ECGs with visual imperfections encountered in routine clinical care paired with imperfection-free images, was created. Turing tests confirmed the realism of the images: expert observer accuracy of discrimination between real-world and synthetic ECGs fell from 63.9% (95% CI 58.0% to 69.8%) to 53.3% (95% CI 48.6% to 58.1%) over three rounds of testing, indicating that observers could not distinguish between synthetic and real ECGs. The performance of pre-existing algorithms on synthetic (area under the curve (AUC) 0.592, 95% CI 0.421 to 0.763) and real-world (AUC 0.647, 95% CI 0.520 to 0.774) ECG images containing imperfections was limited. Algorithm fine-tuning with GenECG data improved real-world ECG classification accuracy (AUC 0.821, 95% CI 0.730 to 0.913) demonstrating its potential to augment image-based algorithm development.

Discussion/conclusion: GenECG is the first synthetic image-based ECG dataset to pass a clinical Turing test. The dataset will enable image-based AI-ECG algorithm development, ensuring utility in low resource areas, prehospital settings and hospital environments where signal data are unavailable.

GenECG:一个基于合成图像的心电数据集,用于增强人工智能增强算法的开发。
基于图像的ECG数据集包含纸质ECG常见的视觉缺陷,通常被扫描或拍照到电子健康记录中,可以促进临床有用的人工智能(AI)-ECG算法的开发。本研究旨在创建一个高保真的、基于合成图像的心电数据集。方法:从PTB-XL数据库中重建心电图图像,采用基于信号的数据集和图像处理技术来模拟现实世界中与心电图相关的缺陷。进行临床图灵测试以评估合成图像的保真度,并使用含有视觉缺陷的合成图像评估当前AI-ECG算法的性能。结果:创建了GenECG,这是一个基于图像的数据集,包含21799张在常规临床护理中遇到的视觉缺陷的心电图,并与无缺陷的图像配对。图灵测试证实了图像的真实性:经过三轮测试,专家观察者区分真实心电图和合成心电图的准确率从63.9% (95% CI 58.0% ~ 69.8%)下降到53.3% (95% CI 48.6% ~ 58.1%),表明观察者无法区分合成心电图和真实心电图。已有算法在包含缺陷的合成(曲线下面积(AUC) 0.592, 95% CI 0.421至0.763)和真实(AUC 0.647, 95% CI 0.520至0.774)心电图像上的性能有限。利用GenECG数据对算法进行微调,提高了实际心电分类准确率(AUC 0.821, 95% CI 0.730至0.913),表明其有潜力增强基于图像的算法开发。讨论/结论:GenECG是第一个通过临床图灵测试的基于图像的合成心电数据集。该数据集将支持基于图像的AI-ECG算法开发,确保在资源匮乏地区、院前设置和无法获得信号数据的医院环境中发挥效用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
6.10
自引率
4.90%
发文量
40
审稿时长
18 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信