Multi-scale nested graph transformer with graph operations: Advancing high-resolution chest x-ray classification

IF 3.2 2区医学 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING

Medical physics Pub Date : 2025-09-25 DOI:10.1002/mp.70003

Dongjing Shan, Mengchu Yang, Lu Huang, Dawa Panduo, Biao Qu

{"title":"Multi-scale nested graph transformer with graph operations: Advancing high-resolution chest x-ray classification","authors":"Dongjing Shan, Mengchu Yang, Lu Huang, Dawa Panduo, Biao Qu","doi":"10.1002/mp.70003","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Background</h3>\n \n <p>Accurate classification of high-resolution chest x-ray (CXR) images is critical for diagnosing lung conditions such as pneumonia and identifying small lesion targets, which demands precise feature extraction from multi-scale anatomical structures. Traditional deep learning models face challenges in balancing local detail retention and global context modeling, particularly with limited labeled data and high computational costs for high-resolution inputs.</p>\n </section>\n \n <section>\n \n <h3> Purpose</h3>\n \n <p>This study introduces a multi-scale nested graph transformer (MNGT) to address these challenges, aiming to enhance classification accuracy for high-resolution CXR images while improving computational efficiency and generalization in data-constrained scenarios.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>(1) Multi-scale nested architecture: High-resolution CXR images are segmented into hierarchical squares: first divided into large blocks, then further subdivided into smaller patches. A graph Transformer with variable attention scope processes these patches to capture local-to-global features, preserving fine details of small lesions (e.g., nodule contours) while modeling long-range dependencies (e.g., lung texture patterns); (2) Cross-Attention Fusion: Features from high-resolution and downscaled low-resolution images are fused using a cross-attention-based graph Transformer, enabling semantic interaction between scales and enhancing lesion discriminability; (3) Graph Pooling for Efficiency: Graph pooling aggregates patches into semantic regions, reducing token count and computational complexity (e.g., from 2401 to 196 tokens) while maintaining structural integrity; (4) Inductive Bias Integration: By incorporating graph convolution and adaptive receptive field adjustments, the model mitigates overfitting in small datasets, leveraging spatial prior knowledge to improve generalization.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>Through extensive experiments on three types of high-resolution CXR images, we demonstrate the superiority of our architecture, surpassing other models in terms of both accuracy and F1-score. Furthermore, our ablation study highlights the efficiency of our designed architecture. The code including comparative models are publicly available on the Website: GitHub/MNGT.</p>\n </section>\n \n <section>\n \n <h3> Conclusions</h3>\n \n <p>MNGT provides an efficient and robust solution for high-resolution CXR classification, combining local detail preservation, global context modeling, and inductive bias to excel in accuracy and generalization. The framework addresses the computational bottleneck of high-resolution medical imaging and offers a viable pathway for clinical deployment in computer-aided diagnosis.</p>\n </section>\n </div>","PeriodicalId":18384,"journal":{"name":"Medical physics","volume":"52 10","pages":""},"PeriodicalIF":3.2000,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical physics","FirstCategoryId":"3","ListUrlMain":"https://aapm.onlinelibrary.wiley.com/doi/10.1002/mp.70003","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}

引用次数: 0

Abstract

Background

Accurate classification of high-resolution chest x-ray (CXR) images is critical for diagnosing lung conditions such as pneumonia and identifying small lesion targets, which demands precise feature extraction from multi-scale anatomical structures. Traditional deep learning models face challenges in balancing local detail retention and global context modeling, particularly with limited labeled data and high computational costs for high-resolution inputs.

Purpose

This study introduces a multi-scale nested graph transformer (MNGT) to address these challenges, aiming to enhance classification accuracy for high-resolution CXR images while improving computational efficiency and generalization in data-constrained scenarios.

Methods

(1) Multi-scale nested architecture: High-resolution CXR images are segmented into hierarchical squares: first divided into large blocks, then further subdivided into smaller patches. A graph Transformer with variable attention scope processes these patches to capture local-to-global features, preserving fine details of small lesions (e.g., nodule contours) while modeling long-range dependencies (e.g., lung texture patterns); (2) Cross-Attention Fusion: Features from high-resolution and downscaled low-resolution images are fused using a cross-attention-based graph Transformer, enabling semantic interaction between scales and enhancing lesion discriminability; (3) Graph Pooling for Efficiency: Graph pooling aggregates patches into semantic regions, reducing token count and computational complexity (e.g., from 2401 to 196 tokens) while maintaining structural integrity; (4) Inductive Bias Integration: By incorporating graph convolution and adaptive receptive field adjustments, the model mitigates overfitting in small datasets, leveraging spatial prior knowledge to improve generalization.

Results

Through extensive experiments on three types of high-resolution CXR images, we demonstrate the superiority of our architecture, surpassing other models in terms of both accuracy and F1-score. Furthermore, our ablation study highlights the efficiency of our designed architecture. The code including comparative models are publicly available on the Website: GitHub/MNGT.

Conclusions

MNGT provides an efficient and robust solution for high-resolution CXR classification, combining local detail preservation, global context modeling, and inductive bias to excel in accuracy and generalization. The framework addresses the computational bottleneck of high-resolution medical imaging and offers a viable pathway for clinical deployment in computer-aided diagnosis.

Abstract Image

查看原文本刊更多论文

具有图形操作的多尺度嵌套图形转换器：推进高分辨率胸部x射线分类。

背景：高分辨率胸部x线（CXR）图像的准确分类对于诊断肺炎等肺部疾病和识别小病灶目标至关重要，这需要从多尺度解剖结构中精确提取特征。传统的深度学习模型在平衡局部细节保留和全局上下文建模方面面临挑战，特别是在有限的标记数据和高分辨率输入的高计算成本的情况下。本研究引入了一种多尺度嵌套图转换器（MNGT）来解决这些挑战，旨在提高高分辨率CXR图像的分类精度，同时提高数据约束场景下的计算效率和泛化能力。方法：(1)多尺度嵌套架构：将高分辨率CXR图像分割成层次正方形，先分割成大块，再细分成小块。具有可变注意力范围的图形转换器处理这些补丁以捕获局部到全局的特征，在建模远程依赖关系（例如，肺纹理模式）时保留小病变的精细细节（例如，结节轮廓）；(2)交叉注意融合：利用基于交叉注意的图形转换器融合高分辨率和低分辨率图像的特征，实现尺度间的语义交互，增强病灶的可分辨性；(3)提高效率的图池化：图池化将补丁聚合到语义区域，在保持结构完整性的同时减少令牌数量和计算复杂度（例如，从2401到196个令牌）；(4)归纳偏差集成：通过结合图卷积和自适应感受野调整，模型减轻了小数据集的过拟合，利用空间先验知识提高泛化。结果：通过对三种类型的高分辨率CXR图像的大量实验，我们证明了我们的架构的优越性，在准确性和f1分数方面都超过了其他模型。此外，我们的消融研究强调了我们设计的建筑的效率。包括比较模型在内的代码可以在GitHub/MNGT网站上公开获得。结论：MNGT为高分辨率CXR分类提供了高效、稳健的解决方案，结合了局部细节保存、全局上下文建模和归纳偏倚，在准确性和泛化方面表现出色。该框架解决了高分辨率医学成像的计算瓶颈，为计算机辅助诊断的临床部署提供了可行的途径。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Medical physics 医学-核医学

CiteScore

6.80

自引率

15.80%

发文量

660

审稿时长

1.7 months

期刊介绍： Medical Physics publishes original, high impact physics, imaging science, and engineering research that advances patient diagnosis and therapy through contributions in 1) Basic science developments with high potential for clinical translation 2) Clinical applications of cutting edge engineering and physics innovations 3) Broadly applicable and innovative clinical physics developments Medical Physics is a journal of global scope and reach. By publishing in Medical Physics your research will reach an international, multidisciplinary audience including practicing medical physicists as well as physics- and engineering based translational scientists. We work closely with authors of promising articles to improve their quality.