Machine learning and artificial intelligence in type 2 diabetes prediction: a comprehensive 33-year bibliometric and literature analysis.

IF 3.2 Q1 HEALTH CARE SCIENCES & SERVICES

Frontiers in digital health Pub Date : 2025-03-27 eCollection Date: 2025-01-01 DOI:10.3389/fdgth.2025.1557467

Mahreen Kiran, Ying Xie, Nasreen Anjum, Graham Ball, Barbara Pierscionek, Duncan Russell

{"title":"Machine learning and artificial intelligence in type 2 diabetes prediction: a comprehensive 33-year bibliometric and literature analysis.","authors":"Mahreen Kiran, Ying Xie, Nasreen Anjum, Graham Ball, Barbara Pierscionek, Duncan Russell","doi":"10.3389/fdgth.2025.1557467","DOIUrl":null,"url":null,"abstract":"Background: Type 2 Diabetes Mellitus (T2DM) remains a critical global health challenge, necessitating robust predictive models to enable early detection and personalized interventions. This study presents a comprehensive bibliometric and systematic review of 33 years (1991-2024) of research on machine learning (ML) and artificial intelligence (AI) applications in T2DM prediction. It highlights the growing complexity of the field and identifies key trends, methodologies, and research gaps.Methods: A systematic methodology guided the literature selection process, starting with keyword identification using Term Frequency-Inverse Document Frequency (TF-IDF) and expert input. Based on these refined keywords, literature was systematically selected using PRISMA guidelines, resulting in a dataset of 2,351 articles from Web of Science and Scopus databases. Bibliometric analysis was performed on the entire selected dataset using tools such as VOSviewer and Bibliometrix, enabling thematic clustering, co-citation analysis, and network visualization. To assess the most impactful literature, a dual-criteria methodology combining relevance and impact scores was applied. Articles were qualitatively assessed on their alignment with T2DM prediction using a four-point relevance scale and quantitatively evaluated based on citation metrics normalized within subject, journal, and publication year. Articles scoring above a predefined threshold were selected for detailed review. The selected literature spans four time periods: 1991-2000, 2001-2010, 2011-2020, and 2021-2024.Results: The bibliometric findings reveal exponential growth in publications since 2010, with the USA and UK leading contributions, followed by emerging players like Singapore and India. Key thematic clusters include foundational ML techniques, epidemiological forecasting, predictive modelling, and clinical applications. Ensemble methods (e.g., Random Forest, Gradient Boosting) and deep learning models (e.g., Convolutional Neural Networks) dominate recent advancements. Literature analysis reveals that, early studies primarily used demographic and clinical variables, while recent efforts integrate genetic, lifestyle, and environmental predictors. Additionally, literature analysis highlights advances in integrating real-world datasets, emerging trends like federated learning, and explainability tools such as SHAP (SHapley Additive exPlanations) and LIME (Local Interpretable Model-agnostic Explanations).Conclusion: Future work should address gaps in generalizability, interdisciplinary T2DM prediction research, and psychosocial integration, while also focusing on clinically actionable solutions and real-world applicability to combat the growing diabetes epidemic effectively.","PeriodicalId":73078,"journal":{"name":"Frontiers in digital health","volume":"7 ","pages":"1557467"},"PeriodicalIF":3.2000,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11983615/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in digital health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fdgth.2025.1557467","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Type 2 Diabetes Mellitus (T2DM) remains a critical global health challenge, necessitating robust predictive models to enable early detection and personalized interventions. This study presents a comprehensive bibliometric and systematic review of 33 years (1991-2024) of research on machine learning (ML) and artificial intelligence (AI) applications in T2DM prediction. It highlights the growing complexity of the field and identifies key trends, methodologies, and research gaps.

Methods: A systematic methodology guided the literature selection process, starting with keyword identification using Term Frequency-Inverse Document Frequency (TF-IDF) and expert input. Based on these refined keywords, literature was systematically selected using PRISMA guidelines, resulting in a dataset of 2,351 articles from Web of Science and Scopus databases. Bibliometric analysis was performed on the entire selected dataset using tools such as VOSviewer and Bibliometrix, enabling thematic clustering, co-citation analysis, and network visualization. To assess the most impactful literature, a dual-criteria methodology combining relevance and impact scores was applied. Articles were qualitatively assessed on their alignment with T2DM prediction using a four-point relevance scale and quantitatively evaluated based on citation metrics normalized within subject, journal, and publication year. Articles scoring above a predefined threshold were selected for detailed review. The selected literature spans four time periods: 1991-2000, 2001-2010, 2011-2020, and 2021-2024.

Results: The bibliometric findings reveal exponential growth in publications since 2010, with the USA and UK leading contributions, followed by emerging players like Singapore and India. Key thematic clusters include foundational ML techniques, epidemiological forecasting, predictive modelling, and clinical applications. Ensemble methods (e.g., Random Forest, Gradient Boosting) and deep learning models (e.g., Convolutional Neural Networks) dominate recent advancements. Literature analysis reveals that, early studies primarily used demographic and clinical variables, while recent efforts integrate genetic, lifestyle, and environmental predictors. Additionally, literature analysis highlights advances in integrating real-world datasets, emerging trends like federated learning, and explainability tools such as SHAP (SHapley Additive exPlanations) and LIME (Local Interpretable Model-agnostic Explanations).

Conclusion: Future work should address gaps in generalizability, interdisciplinary T2DM prediction research, and psychosocial integration, while also focusing on clinically actionable solutions and real-world applicability to combat the growing diabetes epidemic effectively.

查看原文本刊更多论文

机器学习和人工智能在2型糖尿病预测中的应用：一项全面的33年文献计量和文献分析。

背景：2型糖尿病（T2DM）仍然是一个重大的全球健康挑战，需要强大的预测模型来实现早期发现和个性化干预。本研究对机器学习（ML）和人工智能（AI）在T2DM预测中的应用的33年（1991-2024）研究进行了全面的文献计量和系统回顾。它强调了该领域日益增长的复杂性，并确定了关键趋势、方法和研究差距。方法：一个系统的方法指导文献选择过程，从使用术语频率-逆文档频率（TF-IDF）和专家输入的关键词识别开始。基于这些精炼的关键词，使用PRISMA指南系统地选择文献，从Web of Science和Scopus数据库中获得2,351篇文章的数据集。使用VOSviewer和Bibliometrix等工具对整个选定的数据集进行文献计量分析，支持主题聚类、共被引分析和网络可视化。为了评估最具影响力的文献，采用了结合相关性和影响力评分的双标准方法。采用四点相关性量表对文章与T2DM预测的一致性进行定性评估，并根据按主题、期刊和出版年份归一化的引文指标对文章进行定量评估。评分高于预定义阈值的文章被选中进行详细审查。所选文献跨越四个时间段：1991-2000年、2001-2010年、2011-2020年和2021-2024年。结果：文献计量研究结果显示，自2010年以来，美国和英国的出版物呈指数级增长，其次是新加坡和印度等新兴国家。关键专题集群包括基础ML技术、流行病学预测、预测建模和临床应用。集成方法（例如，随机森林，梯度增强）和深度学习模型（例如，卷积神经网络）主导了最近的进展。文献分析显示，早期的研究主要使用人口统计学和临床变量，而最近的研究将遗传、生活方式和环境预测因素结合起来。此外，文献分析强调了整合现实世界数据集的进展，联邦学习等新兴趋势，以及SHAP （SHapley Additive exPlanations）和LIME （Local Interpretable Model-agnostic exPlanations）等可解释性工具。结论：未来的工作应解决普遍性、跨学科T2DM预测研究和社会心理整合方面的差距，同时也应关注临床可操作的解决方案和现实世界的适用性，以有效应对日益增长的糖尿病流行。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊