Genetic biomarkers and machine learning techniques for predicting diabetes: systematic review

IF 10.7 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Artificial Intelligence Review Pub Date : 2024-12-20 DOI:10.1007/s10462-024-11020-w

Sulaiman Khan, Farida Mohsen, Zubair Shah

{"title":"Genetic biomarkers and machine learning techniques for predicting diabetes: systematic review","authors":"Sulaiman Khan, Farida Mohsen, Zubair Shah","doi":"10.1007/s10462-024-11020-w","DOIUrl":null,"url":null,"abstract":"<div><p>Diabetes mellitus is a long-term metabolic condition marked by high blood sugar levels due to issues with insulin production, insulin effectiveness, or a combination of both. It stands as one of the fastest-growing diseases worldwide, projected to afflict 693 million adults by 2045. The escalating prevalence of diabetes and associated health complications (kidney disease, retinopathy, and neuropathy) underscore the imperative to devise predictive models for early diagnosis and intervention. These complications contribute to increased mortality rates, blindness, kidney failure, and an overall diminished quality of life in individuals living with diabetes. While clinical risk factors and glycemic control provide valuable insights, they alone cannot reliably predict the onset of vascular complications. Genetic biomarkers and machine learning techniques have emerged as promising tools for predicting diabetes development risk and associated complications. Despite the emergence of numerous smart AI models for diabetes prediction, there is still a need for a thorough review outlining their progress and challenges. To address this gap, this paper offers a systematic review of the literature on AI-based models for diabetes identification, following the PRISMA extension for scoping reviews guidelines. Our review revealed that multimodal diabetes prediction models outperformed unimodal models. Most studies focused on classical machine learning models, with SNPs being the most used data type, followed by gene expression profiles, while lipidomic and metabolomic data were the least utilized. Moreover, some studies focused on identifying genetic determinants of diabetes complications relied on familial linkage analysis, tailored for robust effect loci. However, these approaches had limitations, including susceptibility to false positives in candidate gene studies and underpowered AI models capabilities due to sample size constraints. The landscape shifted dramatically with the proliferation of genomic datasets, fueled by the emergence of biobanks and the amalgamation of global cohorts. This surge has led to a more than twofold increase in genetic discoveries related to both diabetes and its complications using AI. Our focus here is on these genetic breakthroughs, particularly those empowered by AI models. However, we also highlight the existing gaps in research and underscore the need for further advancements to propel genomic discovery to the next level.</p></div>","PeriodicalId":8449,"journal":{"name":"Artificial Intelligence Review","volume":"58 2","pages":""},"PeriodicalIF":10.7000,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10462-024-11020-w.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence Review","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10462-024-11020-w","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Diabetes mellitus is a long-term metabolic condition marked by high blood sugar levels due to issues with insulin production, insulin effectiveness, or a combination of both. It stands as one of the fastest-growing diseases worldwide, projected to afflict 693 million adults by 2045. The escalating prevalence of diabetes and associated health complications (kidney disease, retinopathy, and neuropathy) underscore the imperative to devise predictive models for early diagnosis and intervention. These complications contribute to increased mortality rates, blindness, kidney failure, and an overall diminished quality of life in individuals living with diabetes. While clinical risk factors and glycemic control provide valuable insights, they alone cannot reliably predict the onset of vascular complications. Genetic biomarkers and machine learning techniques have emerged as promising tools for predicting diabetes development risk and associated complications. Despite the emergence of numerous smart AI models for diabetes prediction, there is still a need for a thorough review outlining their progress and challenges. To address this gap, this paper offers a systematic review of the literature on AI-based models for diabetes identification, following the PRISMA extension for scoping reviews guidelines. Our review revealed that multimodal diabetes prediction models outperformed unimodal models. Most studies focused on classical machine learning models, with SNPs being the most used data type, followed by gene expression profiles, while lipidomic and metabolomic data were the least utilized. Moreover, some studies focused on identifying genetic determinants of diabetes complications relied on familial linkage analysis, tailored for robust effect loci. However, these approaches had limitations, including susceptibility to false positives in candidate gene studies and underpowered AI models capabilities due to sample size constraints. The landscape shifted dramatically with the proliferation of genomic datasets, fueled by the emergence of biobanks and the amalgamation of global cohorts. This surge has led to a more than twofold increase in genetic discoveries related to both diabetes and its complications using AI. Our focus here is on these genetic breakthroughs, particularly those empowered by AI models. However, we also highlight the existing gaps in research and underscore the need for further advancements to propel genomic discovery to the next level.

查看原文本刊更多论文

预测糖尿病的遗传生物标志物和机器学习技术：系统综述

糖尿病是一种长期代谢性疾病，由于胰岛素分泌不足、胰岛素效果不佳或两者兼而有之而导致高血糖。它是全球增长最快的疾病之一，预计到 2045 年将有 6.93 亿成年人患上糖尿病。糖尿病患病率和相关并发症（肾病、视网膜病变和神经病变）的不断上升，凸显了设计早期诊断和干预预测模型的必要性。这些并发症增加了糖尿病患者的死亡率、失明率、肾衰竭率，并全面降低了他们的生活质量。虽然临床风险因素和血糖控制提供了有价值的见解，但仅靠它们并不能可靠地预测血管并发症的发生。遗传生物标志物和机器学习技术已成为预测糖尿病发病风险和相关并发症的有效工具。尽管出现了许多用于糖尿病预测的智能人工智能模型，但仍有必要对其进展和挑战进行全面回顾。为了填补这一空白，本文按照PRISMA扩展范围综述指南，对基于人工智能的糖尿病识别模型的相关文献进行了系统综述。我们的综述显示，多模态糖尿病预测模型优于单模态模型。大多数研究侧重于经典的机器学习模型，其中 SNP 是使用最多的数据类型，其次是基因表达谱，而脂质体和代谢组数据则使用最少。此外，一些侧重于确定糖尿病并发症遗传决定因素的研究依赖于家族关联分析，为稳健效应位点量身定制。然而，这些方法也有局限性，包括在候选基因研究中容易出现假阳性，以及由于样本量的限制，人工智能模型能力不足。随着基因组数据集的激增，生物库的出现和全球队列的合并使情况发生了巨大变化。这种激增导致利用人工智能发现的与糖尿病及其并发症有关的基因增加了两倍多。我们在此重点关注这些基因突破，尤其是那些由人工智能模型赋能的突破。不过，我们也强调了研究中的现有差距，并强调需要进一步的进步，以推动基因组发现更上一层楼。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Artificial Intelligence Review 工程技术-计算机：人工智能

CiteScore

22.00

自引率

3.30%

发文量

194

审稿时长

5.3 months

期刊介绍： Artificial Intelligence Review, a fully open access journal, publishes cutting-edge research in artificial intelligence and cognitive science. It features critical evaluations of applications, techniques, and algorithms, providing a platform for both researchers and application developers. The journal includes refereed survey and tutorial articles, along with reviews and commentary on significant developments in the field.