Siamese neural network-enhanced electrocardiography can re-identify anonymized healthcare data.

IF 3.9 Q1 CARDIAC & CARDIOVASCULAR SYSTEMS
European heart journal. Digital health Pub Date : 2025-02-25 eCollection Date: 2025-05-01 DOI:10.1093/ehjdh/ztaf011
Krzysztof Macierzanka, Arunashis Sau, Konstantinos Patlatzoglou, Libor Pastika, Ewa Sieliwonczyk, Mehak Gurnani, Nicholas S Peters, Jonathan W Waks, Daniel B Kramer, Fu Siong Ng
{"title":"Siamese neural network-enhanced electrocardiography can re-identify anonymized healthcare data.","authors":"Krzysztof Macierzanka, Arunashis Sau, Konstantinos Patlatzoglou, Libor Pastika, Ewa Sieliwonczyk, Mehak Gurnani, Nicholas S Peters, Jonathan W Waks, Daniel B Kramer, Fu Siong Ng","doi":"10.1093/ehjdh/ztaf011","DOIUrl":null,"url":null,"abstract":"<p><strong>Aims: </strong>Many research databases with anonymized patient data contain electrocardiograms (ECGs) from which traditional identifiers have been removed. We evaluated the ability of artificial intelligence (AI) methods to determine the similarity between ECGs and assessed whether they have the potential to be misused to re-identify individuals from anonymized datasets.</p><p><strong>Methods and results: </strong>We utilized a convolutional Siamese neural network (SNN) architecture, which derives a Euclidean distance similarity metric between two input ECGs. A secondary care dataset of 864 283 ECGs (72 455 subjects) was used. Siamese neural network-electrocardiogram (SNN-ECG) achieves an accuracy of 91.68% when classifying between 2 689 124 same-subject pairs and 2 689 124 different-subject pairs. This performance increases to 93.61% and 95.97% in outpatient and normal ECG subsets. In a simulated 'motivated intruder' test, SNN-ECG can identify individuals from large datasets. In datasets of 100, 1000, 10 000, and 20 000 ECGs, where only one ECG is also from the reference individual, it achieves success rates of 79.2%, 62.6%, 45.0%, and 40.0%, respectively. If this was random, the success would be 1%, 0.1%, 0.01%, and 0.005%, respectively. Additional basic information, like subject sex or age-range, enhances performance further. We also found that, on the subject level, ECG pair similarity is clinically relevant; greater ECG dissimilarity associates with all-cause mortality [hazard ratio, 1.22 (1.21-1.23), <i>P</i> < 0.0001] and is additive to an AI-ECG model trained for mortality prediction.</p><p><strong>Conclusion: </strong>Anonymized ECGs retain information that may facilitate subject re-identification, raising privacy and data protection concerns. However, SNN-ECG models also have positive uses and can enhance risk prediction of cardiovascular disease.</p>","PeriodicalId":72965,"journal":{"name":"European heart journal. Digital health","volume":"6 3","pages":"417-426"},"PeriodicalIF":3.9000,"publicationDate":"2025-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12088719/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European heart journal. Digital health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/ehjdh/ztaf011","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/5/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"CARDIAC & CARDIOVASCULAR SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Aims: Many research databases with anonymized patient data contain electrocardiograms (ECGs) from which traditional identifiers have been removed. We evaluated the ability of artificial intelligence (AI) methods to determine the similarity between ECGs and assessed whether they have the potential to be misused to re-identify individuals from anonymized datasets.

Methods and results: We utilized a convolutional Siamese neural network (SNN) architecture, which derives a Euclidean distance similarity metric between two input ECGs. A secondary care dataset of 864 283 ECGs (72 455 subjects) was used. Siamese neural network-electrocardiogram (SNN-ECG) achieves an accuracy of 91.68% when classifying between 2 689 124 same-subject pairs and 2 689 124 different-subject pairs. This performance increases to 93.61% and 95.97% in outpatient and normal ECG subsets. In a simulated 'motivated intruder' test, SNN-ECG can identify individuals from large datasets. In datasets of 100, 1000, 10 000, and 20 000 ECGs, where only one ECG is also from the reference individual, it achieves success rates of 79.2%, 62.6%, 45.0%, and 40.0%, respectively. If this was random, the success would be 1%, 0.1%, 0.01%, and 0.005%, respectively. Additional basic information, like subject sex or age-range, enhances performance further. We also found that, on the subject level, ECG pair similarity is clinically relevant; greater ECG dissimilarity associates with all-cause mortality [hazard ratio, 1.22 (1.21-1.23), P < 0.0001] and is additive to an AI-ECG model trained for mortality prediction.

Conclusion: Anonymized ECGs retain information that may facilitate subject re-identification, raising privacy and data protection concerns. However, SNN-ECG models also have positive uses and can enhance risk prediction of cardiovascular disease.

暹罗神经网络增强的心电图可以重新识别匿名的医疗保健数据。
目的:许多具有匿名患者数据的研究数据库包含传统标识符已被删除的心电图(ECGs)。我们评估了人工智能(AI)方法确定心电图之间相似性的能力,并评估了它们是否有可能被滥用,从匿名数据集中重新识别个体。方法和结果:我们使用了卷积连体神经网络(SNN)架构,该架构在两个输入心电图之间导出欧几里得距离相似性度量。使用二级护理数据集864 283张心电图(72 455名受试者)。Siamese神经网络-心电图(SNN-ECG)在2 689 124对同主体和2 689 124对异主体之间进行分类,准确率达到91.68%。在门诊和正常心电图亚群中,这一表现分别增加到93.61%和95.97%。在模拟的“动机入侵者”测试中,SNN-ECG可以从大型数据集中识别个体。在100、1000、10000和20000个心电图的数据集中,其中只有一个心电图也是来自参考个体,成功率分别为79.2%、62.6%、45.0%和40.0%。如果这是随机的,那么成功率将分别为1%、0.1%、0.01%和0.005%。额外的基本信息,如受试者的性别或年龄范围,可以进一步提高表现。我们还发现,在受试者水平上,ECG对相似度与临床相关;更大的ECG差异与全因死亡率相关[风险比,1.22 (1.21-1.23),P < 0.0001],并且是用于死亡率预测的AI-ECG模型的附加因素。结论:匿名心电图保留的信息可能有助于受试者的重新识别,提高隐私和数据保护问题。然而,SNN-ECG模型也有积极的用途,可以增强心血管疾病的风险预测。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
5.00
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信