Accurate classification of high-resolution chest x-ray (CXR) images is critical for diagnosing lung conditions such as pneumonia and identifying small lesion targets, which demands precise feature extraction from multi-scale anatomical structures. Traditional deep learning models face challenges in balancing local detail retention and global context modeling, particularly with limited labeled data and high computational costs for high-resolution inputs.
This study introduces a multi-scale nested graph transformer (MNGT) to address these challenges, aiming to enhance classification accuracy for high-resolution CXR images while improving computational efficiency and generalization in data-constrained scenarios.
(1) Multi-scale nested architecture: High-resolution CXR images are segmented into hierarchical squares: first divided into large blocks, then further subdivided into smaller patches. A graph Transformer with variable attention scope processes these patches to capture local-to-global features, preserving fine details of small lesions (e.g., nodule contours) while modeling long-range dependencies (e.g., lung texture patterns); (2) Cross-Attention Fusion: Features from high-resolution and downscaled low-resolution images are fused using a cross-attention-based graph Transformer, enabling semantic interaction between scales and enhancing lesion discriminability; (3) Graph Pooling for Efficiency: Graph pooling aggregates patches into semantic regions, reducing token count and computational complexity (e.g., from 2401 to 196 tokens) while maintaining structural integrity; (4) Inductive Bias Integration: By incorporating graph convolution and adaptive receptive field adjustments, the model mitigates overfitting in small datasets, leveraging spatial prior knowledge to improve generalization.
Through extensive experiments on three types of high-resolution CXR images, we demonstrate the superiority of our architecture, surpassing other models in terms of both accuracy and F1-score. Furthermore, our ablation study highlights the efficiency of our designed architecture. The code including comparative models are publicly available on the Website: GitHub/MNGT.
MNGT provides an efficient and robust solution for high-resolution CXR classification, combining local detail preservation, global context modeling, and inductive bias to excel in accuracy and generalization. The framework addresses the computational bottleneck of high-resolution medical imaging and offers a viable pathway for clinical deployment in computer-aided diagnosis.