Artificial intelligence methods and approaches to improve data quality in helthcare data

IF 5.4

Artificial intelligence in the life sciences Pub Date : 2025-07-04 DOI:10.1016/j.ailsci.2025.100135

Jarmakoviča Agate

{"title":"Artificial intelligence methods and approaches to improve data quality in helthcare data","authors":"Jarmakoviča Agate","doi":"10.1016/j.ailsci.2025.100135","DOIUrl":null,"url":null,"abstract":"<div><div>This study explores artificial intelligence (AI) methods and approaches used to improve data quality, with a particular focus on healthcare data. Applying a systematic literature review based on the PRISMA framework, the research examines publications from 2020 to 2025 that analyze AI applications across key data quality dimensions—accuracy, completeness, consistency, timeliness, uniqueness, and validity. The study aims to identify which AI methods are most commonly employed and how they align with these quality attributes. A conceptual map was developed to visualize the relationships between dimensions and AI techniques such as deep learning, federated learning, data-centric AI, and ontology-based data governance. Findings reveal that accuracy and consistency are the most emphasized dimensions in the literature, with methods like supervised learning, NLP, and isolation forest frequently applied. In contrast, dimensions like timeliness and validity receive comparatively limited attention. The study concludes that certain AI methods—particularly data-centric and cross-cutting approaches—are effective in addressing multiple data quality challenges simultaneously. These insights offer practical guidance for selecting AI strategies in healthcare data quality improvement and highlight areas for future research.</div></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"8 ","pages":"Article 100135"},"PeriodicalIF":5.4000,"publicationDate":"2025-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial intelligence in the life sciences","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S266731852500011X","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

This study explores artificial intelligence (AI) methods and approaches used to improve data quality, with a particular focus on healthcare data. Applying a systematic literature review based on the PRISMA framework, the research examines publications from 2020 to 2025 that analyze AI applications across key data quality dimensions—accuracy, completeness, consistency, timeliness, uniqueness, and validity. The study aims to identify which AI methods are most commonly employed and how they align with these quality attributes. A conceptual map was developed to visualize the relationships between dimensions and AI techniques such as deep learning, federated learning, data-centric AI, and ontology-based data governance. Findings reveal that accuracy and consistency are the most emphasized dimensions in the literature, with methods like supervised learning, NLP, and isolation forest frequently applied. In contrast, dimensions like timeliness and validity receive comparatively limited attention. The study concludes that certain AI methods—particularly data-centric and cross-cutting approaches—are effective in addressing multiple data quality challenges simultaneously. These insights offer practical guidance for selecting AI strategies in healthcare data quality improvement and highlight areas for future research.

查看原文本刊更多论文

提高医疗保健数据质量的人工智能方法和途径

本研究探讨了用于提高数据质量的人工智能（AI）方法和方法，特别关注医疗保健数据。该研究基于PRISMA框架进行了系统的文献综述，审查了2020年至2025年的出版物，这些出版物从关键数据质量维度（准确性、完整性、一致性、及时性、独特性和有效性）分析了人工智能应用。该研究旨在确定最常用的人工智能方法，以及它们如何与这些质量属性保持一致。开发了一个概念图来可视化维度与人工智能技术（如深度学习、联邦学习、以数据为中心的人工智能和基于本体的数据治理）之间的关系。研究结果表明，准确性和一致性是文献中最强调的维度，经常使用监督学习、自然语言处理和隔离森林等方法。相比之下，时效性和有效性等维度受到的关注相对有限。该研究得出结论，某些人工智能方法——特别是以数据为中心和跨领域的方法——在同时应对多种数据质量挑战方面是有效的。这些见解为选择医疗保健数据质量改进中的人工智能策略提供了实用指导，并突出了未来研究的领域。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Artificial intelligence in the life sciences Pharmacology, Biochemistry, Genetics and Molecular Biology (General), Computer Science Applications, Health Informatics, Drug Discovery, Veterinary Science and Veterinary Medicine (General)

CiteScore

5.00

自引率

0.00%

发文量

审稿时长

15 days