Preparing clinical research data for artificial intelligence readiness: insights from the National Institute of Diabetes and Digestive and Kidney Diseases data centric challenge.

IF 4.6 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS
Marcin J Domagalski, Yin Lu, Alexander Pilozzi, Alicia Williamson, Padmini Chilappagari, Emma Luker, Courtney D Shelley, Anya Dabic, Michael A Keller, Rebecca M Rodriguez, Sharon Lawlor, Ratna R Thangudu
{"title":"Preparing clinical research data for artificial intelligence readiness: insights from the National Institute of Diabetes and Digestive and Kidney Diseases data centric challenge.","authors":"Marcin J Domagalski, Yin Lu, Alexander Pilozzi, Alicia Williamson, Padmini Chilappagari, Emma Luker, Courtney D Shelley, Anya Dabic, Michael A Keller, Rebecca M Rodriguez, Sharon Lawlor, Ratna R Thangudu","doi":"10.1093/jamia/ocaf114","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>The success of artificial intelligence (AI) and machine learning (ML) approaches in biomedical research depends on the quality of the underlying data. The National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) Data Centric Challenge was designed to address the challenge of making raw clinical research data AI ready, with a focus on type 1 diabetes studies available in the NIDDK Central Repository (NIDDK-CR). This paper aims to present a structured methodology for enhancing the AI readiness of clinical datasets.</p><p><strong>Materials and methods: </strong>We detail a systematic approach for data aggregation and preprocessing, including binning continuous data, processing text features, managing missing values, and encoding for categorical variables while maintaining the data integrity and compatibility with ML algorithms.</p><p><strong>Results: </strong>We applied the proposed methodology to transform raw clinical data from type 1 diabetes studies in the NIDDK-CR into a structured, AI-ready dataset. The evaluation process validated the effectiveness of our AI-readiness enhancement steps and explored the potential use cases in type 1 diabetes research.</p><p><strong>Discussion: </strong>The methodology discussed in this paper will serve as guidance for preparing data for AI-driven clinical research, with the resulting AI-ready data to serve as a training tool for building and improving AI/ML model performance.</p><p><strong>Conclusion: </strong>We present a generalizable framework for preparing clinical research data for AI applications. The resulting datasets lay a strong foundation for downstream AI/ML applications, setting the stage for a new era of data-driven discoveries.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6000,"publicationDate":"2025-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the American Medical Informatics Association","FirstCategoryId":"91","ListUrlMain":"https://doi.org/10.1093/jamia/ocaf114","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Objectives: The success of artificial intelligence (AI) and machine learning (ML) approaches in biomedical research depends on the quality of the underlying data. The National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) Data Centric Challenge was designed to address the challenge of making raw clinical research data AI ready, with a focus on type 1 diabetes studies available in the NIDDK Central Repository (NIDDK-CR). This paper aims to present a structured methodology for enhancing the AI readiness of clinical datasets.

Materials and methods: We detail a systematic approach for data aggregation and preprocessing, including binning continuous data, processing text features, managing missing values, and encoding for categorical variables while maintaining the data integrity and compatibility with ML algorithms.

Results: We applied the proposed methodology to transform raw clinical data from type 1 diabetes studies in the NIDDK-CR into a structured, AI-ready dataset. The evaluation process validated the effectiveness of our AI-readiness enhancement steps and explored the potential use cases in type 1 diabetes research.

Discussion: The methodology discussed in this paper will serve as guidance for preparing data for AI-driven clinical research, with the resulting AI-ready data to serve as a training tool for building and improving AI/ML model performance.

Conclusion: We present a generalizable framework for preparing clinical research data for AI applications. The resulting datasets lay a strong foundation for downstream AI/ML applications, setting the stage for a new era of data-driven discoveries.

为人工智能准备临床研究数据:来自国家糖尿病、消化和肾脏疾病研究所数据中心挑战的见解。
目标:人工智能(AI)和机器学习(ML)方法在生物医学研究中的成功取决于基础数据的质量。美国国家糖尿病、消化和肾脏疾病研究所(NIDDK)以数据为中心的挑战旨在解决人工智能准备原始临床研究数据的挑战,重点是NIDDK中央存储库(NIDDK- cr)中的1型糖尿病研究。本文旨在提出一种结构化的方法来增强临床数据集的人工智能准备。材料和方法:我们详细介绍了一种系统的数据聚合和预处理方法,包括对连续数据进行分组、处理文本特征、管理缺失值和对分类变量进行编码,同时保持数据完整性和与ML算法的兼容性。结果:我们应用提出的方法将NIDDK-CR中1型糖尿病研究的原始临床数据转换为结构化的ai就绪数据集。评估过程验证了我们人工智能准备增强步骤的有效性,并探索了在1型糖尿病研究中的潜在用例。讨论:本文讨论的方法将作为为人工智能驱动的临床研究准备数据的指导,所得的人工智能就绪数据将作为构建和改进人工智能/机器学习模型性能的培训工具。结论:我们提出了一个为人工智能应用准备临床研究数据的通用框架。由此产生的数据集为下游AI/ML应用奠定了坚实的基础,为数据驱动发现的新时代奠定了基础。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of the American Medical Informatics Association
Journal of the American Medical Informatics Association 医学-计算机:跨学科应用
CiteScore
14.50
自引率
7.80%
发文量
230
审稿时长
3-8 weeks
期刊介绍: JAMIA is AMIA''s premier peer-reviewed journal for biomedical and health informatics. Covering the full spectrum of activities in the field, JAMIA includes informatics articles in the areas of clinical care, clinical research, translational science, implementation science, imaging, education, consumer health, public health, and policy. JAMIA''s articles describe innovative informatics research and systems that help to advance biomedical science and to promote health. Case reports, perspectives and reviews also help readers stay connected with the most important informatics developments in implementation, policy and education.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信