Attention-driven graph-based machine learning for non-invasive diagnosis of NAFLD

Intelligence-based medicine Pub Date : 2025-01-01 DOI:10.1016/j.ibmed.2025.100288

Ekta Srivastava , Sarath Mohan , Tapan Kumar Gandhi , Ashok Kumar Choudhury , Sandeep Kumar

{"title":"Attention-driven graph-based machine learning for non-invasive diagnosis of NAFLD","authors":"Ekta Srivastava , Sarath Mohan , Tapan Kumar Gandhi , Ashok Kumar Choudhury , Sandeep Kumar","doi":"10.1016/j.ibmed.2025.100288","DOIUrl":null,"url":null,"abstract":"<div><div>An estimated 25%–30% of the global population is affected by non-alcoholic fatty liver disease (NAFLD), a silent yet progressive condition that can advance from simple steatosis to severe stages like non-alcoholic steatohepatitis (NASH), fibrosis, and cirrhosis, significantly heightening the risk of liver cancer. Currently, the gold-standard method for staging NAFLD is liver biopsy, an invasive procedure with risks such as bleeding, infection, and sampling error. Due to its high cost and impracticality for routine monitoring, there is a critical need for reliable, non-invasive diagnostic tools capable of effectively identifying NAFLD stages. We developed a graph-based framework in which each patient is represented as a node in a similarity network. Edges are formed via k-nearest neighbors (KNN) on standardized clinical and biochemical features, with missing values imputed by KNN to preserve biologically plausible variability. A two-layer Graph Attention Network (GAT) then learns edge-specific attention weights to focus on the most informative inter-patient relationships. Tested on a proprietary ILBS cohort (n = 622), our model achieved 75.2% accuracy (AUC = 0.768; F1 = 0.752), an 11% absolute improvement over Support Vector Machines and Random Forests, and demonstrated robustness in 10-fold cross-validation and adversarial noise tests. On a separate public dataset (n = 80) spanning lipidomic, glycomic, fatty acid, and hormone panels, it exceeded 99% accuracy (AUC <span><math><mo>></mo></math></span> 0.99). Attention-based explanations further highlighted key patient similarities driving each prediction. These findings suggest that attention-driven graph learning can clearly improve non-invasive NAFLD staging, enabling early detection and supporting personalized disease monitoring in diverse clinical settings.</div></div>","PeriodicalId":73399,"journal":{"name":"Intelligence-based medicine","volume":"12 ","pages":"Article 100288"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Intelligence-based medicine","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666521225000924","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

An estimated 25%–30% of the global population is affected by non-alcoholic fatty liver disease (NAFLD), a silent yet progressive condition that can advance from simple steatosis to severe stages like non-alcoholic steatohepatitis (NASH), fibrosis, and cirrhosis, significantly heightening the risk of liver cancer. Currently, the gold-standard method for staging NAFLD is liver biopsy, an invasive procedure with risks such as bleeding, infection, and sampling error. Due to its high cost and impracticality for routine monitoring, there is a critical need for reliable, non-invasive diagnostic tools capable of effectively identifying NAFLD stages. We developed a graph-based framework in which each patient is represented as a node in a similarity network. Edges are formed via k-nearest neighbors (KNN) on standardized clinical and biochemical features, with missing values imputed by KNN to preserve biologically plausible variability. A two-layer Graph Attention Network (GAT) then learns edge-specific attention weights to focus on the most informative inter-patient relationships. Tested on a proprietary ILBS cohort (n = 622), our model achieved 75.2% accuracy (AUC = 0.768; F1 = 0.752), an 11% absolute improvement over Support Vector Machines and Random Forests, and demonstrated robustness in 10-fold cross-validation and adversarial noise tests. On a separate public dataset (n = 80) spanning lipidomic, glycomic, fatty acid, and hormone panels, it exceeded 99% accuracy (AUC

>

0.99). Attention-based explanations further highlighted key patient similarities driving each prediction. These findings suggest that attention-driven graph learning can clearly improve non-invasive NAFLD staging, enabling early detection and supporting personalized disease monitoring in diverse clinical settings.

Abstract Image

查看原文本刊更多论文

基于注意力驱动图的机器学习在非侵入性NAFLD诊断中的应用

据估计，全球25%-30%的人口受到非酒精性脂肪性肝病（NAFLD）的影响，这是一种沉默但进展的疾病，可从单纯的脂肪变性发展到严重阶段，如非酒精性脂肪性肝炎（NASH）、纤维化和肝硬化，显著增加了肝癌的风险。目前，NAFLD分期的金标准方法是肝活检，这是一种侵入性手术，存在出血、感染和抽样错误等风险。由于其高成本和常规监测的不实用性，迫切需要能够有效识别NAFLD分期的可靠、非侵入性诊断工具。我们开发了一个基于图形的框架，其中每个患者都表示为相似网络中的节点。边缘是通过标准化临床和生化特征的k近邻（KNN）形成的，缺失值由KNN输入以保持生物学上合理的可变性。然后，两层图注意网络（GAT）学习边缘特定注意权重，以关注最具信息量的患者间关系。在专有的ILBS队列（n = 622）上进行测试，我们的模型达到了75.2%的准确率（AUC = 0.768; F1 = 0.752），比支持向量机和随机森林提高了11%，并在10倍交叉验证和对抗噪声测试中显示出鲁棒性。在一个独立的公共数据集（n = 80）上，包括脂质组、糖糖组、脂肪酸组和激素组，准确率超过99% （AUC > 0.99）。基于注意力的解释进一步强调了驱动每种预测的关键患者相似性。这些发现表明，注意力驱动的图学习可以明显改善非侵入性NAFLD的分期，使早期发现成为可能，并在不同的临床环境中支持个性化的疾病监测。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊