设计和技术验证生成合成12导联心电图数据集，以促进人工智能研究。

IF 3.4 3区医学 Q1 MEDICAL INFORMATICS

Health Information Science and Systems Pub Date : 2023-08-30 eCollection Date: 2023-12-01 DOI:10.1007/s13755-023-00241-y

Hakje Yoo, Jose Moon, Jong-Ho Kim, Hyung Joon Joo

{"title":"设计和技术验证生成合成12导联心电图数据集，以促进人工智能研究。","authors":"Hakje Yoo, Jose Moon, Jong-Ho Kim, Hyung Joon Joo","doi":"10.1007/s13755-023-00241-y","DOIUrl":null,"url":null,"abstract":"Purpose: The purpose of this study is to construct a synthetic dataset of ECG signal that overcomes the sensitivity of personal information and the complexity of disclosure policies.Methods: The public dataset was constructed by generating synthetic data based on the deep learning model using a convolution neural network (CNN) and bi-directional long short-term memory (Bi-LSTM), and the effectiveness of the dataset was verified by developing classification models for ECG diagnoses.Results: The synthetic 12-lead ECG dataset generated consists of a total of 6000 ECGs, with normal and 5 abnormal groups. The synthetic ECG signal has a waveform pattern similar to the original ECG signal, the average RMSE between the two signals is 0.042 µV, and the average cosine similarity is 0.993. In addition, five classification models were developed to verify the effect of the synthetic dataset and showed performance similar to that of the model made with the actual dataset. In particular, even when the real dataset was applied as a test set to the classification model trained with the synthetic dataset, the classification performance of all models showed high accuracy (average accuracy 93.41%).Conclusion: The synthetic 12-lead ECG dataset was confirmed to perform similarly to the real-world 12-lead ECG in the classification model. This implies that a synthetic dataset can perform similarly to a real dataset in clinical research using AI. The synthetic dataset generation process in this study provides a way to overcome the medical data disclosure challenges constrained by privacy rights, a way to encourage open data policies, and contribute significantly to promoting cardiovascular disease research.","PeriodicalId":46312,"journal":{"name":"Health Information Science and Systems","volume":"11 1","pages":"41"},"PeriodicalIF":3.4000,"publicationDate":"2023-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10468461/pdf/","citationCount":"1","resultStr":"{\"title\":\"Design and technical validation to generate a synthetic 12-lead electrocardiogram dataset to promote artificial intelligence research.\",\"authors\":\"Hakje Yoo, Jose Moon, Jong-Ho Kim, Hyung Joon Joo\",\"doi\":\"10.1007/s13755-023-00241-y\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Purpose: The purpose of this study is to construct a synthetic dataset of ECG signal that overcomes the sensitivity of personal information and the complexity of disclosure policies.Methods: The public dataset was constructed by generating synthetic data based on the deep learning model using a convolution neural network (CNN) and bi-directional long short-term memory (Bi-LSTM), and the effectiveness of the dataset was verified by developing classification models for ECG diagnoses.Results: The synthetic 12-lead ECG dataset generated consists of a total of 6000 ECGs, with normal and 5 abnormal groups. The synthetic ECG signal has a waveform pattern similar to the original ECG signal, the average RMSE between the two signals is 0.042 µV, and the average cosine similarity is 0.993. In addition, five classification models were developed to verify the effect of the synthetic dataset and showed performance similar to that of the model made with the actual dataset. In particular, even when the real dataset was applied as a test set to the classification model trained with the synthetic dataset, the classification performance of all models showed high accuracy (average accuracy 93.41%).Conclusion: The synthetic 12-lead ECG dataset was confirmed to perform similarly to the real-world 12-lead ECG in the classification model. This implies that a synthetic dataset can perform similarly to a real dataset in clinical research using AI. The synthetic dataset generation process in this study provides a way to overcome the medical data disclosure challenges constrained by privacy rights, a way to encourage open data policies, and contribute significantly to promoting cardiovascular disease research.\",\"PeriodicalId\":46312,\"journal\":{\"name\":\"Health Information Science and Systems\",\"volume\":\"11 1\",\"pages\":\"41\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2023-08-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10468461/pdf/\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Health Information Science and Systems\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1007/s13755-023-00241-y\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2023/12/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q1\",\"JCRName\":\"MEDICAL INFORMATICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Health Information Science and Systems","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s13755-023-00241-y","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/12/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}

引用次数: 1

摘要

目的：本研究的目的是构建一个心电图信号的合成数据集，以克服个人信息的敏感性和披露政策的复杂性。方法：使用卷积神经网络（CNN）和双向长短期记忆（bi-LSTM）在深度学习模型的基础上生成合成数据，构建公共数据集，并通过开发心电图诊断分类模型验证数据集的有效性。结果：生成的合成12导联心电图数据集由6000个心电图组成，包括正常组和5个异常组。合成的ECG信号具有与原始ECG信号相似的波形模式，两个信号之间的平均RMSE为0.042µV，平均余弦相似性为0.993。此外，还开发了五个分类模型来验证合成数据集的效果，并显示出与实际数据集模型相似的性能。特别是，即使将真实数据集作为测试集应用于用合成数据集训练的分类模型，所有模型的分类性能都显示出较高的准确性（平均准确率93.41%）。这意味着，在使用人工智能的临床研究中，合成数据集可以与真实数据集表现相似。本研究中的合成数据集生成过程提供了一种克服隐私权限制的医疗数据披露挑战的方法，一种鼓励开放数据政策的方法，并为促进心血管疾病研究做出了重大贡献。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Design and technical validation to generate a synthetic 12-lead electrocardiogram dataset to promote artificial intelligence research.

查看原文本刊更多论文

Design and technical validation to generate a synthetic 12-lead electrocardiogram dataset to promote artificial intelligence research.

Purpose: The purpose of this study is to construct a synthetic dataset of ECG signal that overcomes the sensitivity of personal information and the complexity of disclosure policies.

Methods: The public dataset was constructed by generating synthetic data based on the deep learning model using a convolution neural network (CNN) and bi-directional long short-term memory (Bi-LSTM), and the effectiveness of the dataset was verified by developing classification models for ECG diagnoses.

Results: The synthetic 12-lead ECG dataset generated consists of a total of 6000 ECGs, with normal and 5 abnormal groups. The synthetic ECG signal has a waveform pattern similar to the original ECG signal, the average RMSE between the two signals is 0.042 µV, and the average cosine similarity is 0.993. In addition, five classification models were developed to verify the effect of the synthetic dataset and showed performance similar to that of the model made with the actual dataset. In particular, even when the real dataset was applied as a test set to the classification model trained with the synthetic dataset, the classification performance of all models showed high accuracy (average accuracy 93.41%).

Conclusion: The synthetic 12-lead ECG dataset was confirmed to perform similarly to the real-world 12-lead ECG in the classification model. This implies that a synthetic dataset can perform similarly to a real dataset in clinical research using AI. The synthetic dataset generation process in this study provides a way to overcome the medical data disclosure challenges constrained by privacy rights, a way to encourage open data policies, and contribute significantly to promoting cardiovascular disease research.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Health Information Science and Systems MEDICAL INFORMATICS-

CiteScore

11.30

自引率

5.00%

发文量

期刊介绍： Health Information Science and Systems is a multidisciplinary journal that integrates artificial intelligence/computer science/information technology with health science and services, embracing information science research coupled with topics related to the modeling, design, development, integration and management of health information systems, smart health, artificial intelligence in medicine, and computer aided diagnosis, medical expert systems. The scope includes: i.) smart health, artificial Intelligence in medicine, computer aided diagnosis, medical image processing, medical expert systems ii.) medical big data, medical/health/biomedicine information resources such as patient medical records, devices and equipments, software and tools to capture, store, retrieve, process, analyze, optimize the use of information in the health domain, iii.) data management, data mining, and knowledge discovery, all of which play a key role in decision making, management of public health, examination of standards, privacy and security issues, iv.) development of new architectures and applications for health information systems.