Dongjing Shan, Mengchu Yang, Lu Huang, Dawa Panduo, Biao Qu
{"title":"具有图形操作的多尺度嵌套图形转换器:推进高分辨率胸部x射线分类。","authors":"Dongjing Shan, Mengchu Yang, Lu Huang, Dawa Panduo, Biao Qu","doi":"10.1002/mp.70003","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Background</h3>\n \n <p>Accurate classification of high-resolution chest x-ray (CXR) images is critical for diagnosing lung conditions such as pneumonia and identifying small lesion targets, which demands precise feature extraction from multi-scale anatomical structures. Traditional deep learning models face challenges in balancing local detail retention and global context modeling, particularly with limited labeled data and high computational costs for high-resolution inputs.</p>\n </section>\n \n <section>\n \n <h3> Purpose</h3>\n \n <p>This study introduces a multi-scale nested graph transformer (MNGT) to address these challenges, aiming to enhance classification accuracy for high-resolution CXR images while improving computational efficiency and generalization in data-constrained scenarios.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>(1) Multi-scale nested architecture: High-resolution CXR images are segmented into hierarchical squares: first divided into large blocks, then further subdivided into smaller patches. A graph Transformer with variable attention scope processes these patches to capture local-to-global features, preserving fine details of small lesions (e.g., nodule contours) while modeling long-range dependencies (e.g., lung texture patterns); (2) Cross-Attention Fusion: Features from high-resolution and downscaled low-resolution images are fused using a cross-attention-based graph Transformer, enabling semantic interaction between scales and enhancing lesion discriminability; (3) Graph Pooling for Efficiency: Graph pooling aggregates patches into semantic regions, reducing token count and computational complexity (e.g., from 2401 to 196 tokens) while maintaining structural integrity; (4) Inductive Bias Integration: By incorporating graph convolution and adaptive receptive field adjustments, the model mitigates overfitting in small datasets, leveraging spatial prior knowledge to improve generalization.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>Through extensive experiments on three types of high-resolution CXR images, we demonstrate the superiority of our architecture, surpassing other models in terms of both accuracy and F1-score. Furthermore, our ablation study highlights the efficiency of our designed architecture. The code including comparative models are publicly available on the Website: GitHub/MNGT.</p>\n </section>\n \n <section>\n \n <h3> Conclusions</h3>\n \n <p>MNGT provides an efficient and robust solution for high-resolution CXR classification, combining local detail preservation, global context modeling, and inductive bias to excel in accuracy and generalization. The framework addresses the computational bottleneck of high-resolution medical imaging and offers a viable pathway for clinical deployment in computer-aided diagnosis.</p>\n </section>\n </div>","PeriodicalId":18384,"journal":{"name":"Medical physics","volume":"52 10","pages":""},"PeriodicalIF":3.2000,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi-scale nested graph transformer with graph operations: Advancing high-resolution chest x-ray classification\",\"authors\":\"Dongjing Shan, Mengchu Yang, Lu Huang, Dawa Panduo, Biao Qu\",\"doi\":\"10.1002/mp.70003\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n \\n <section>\\n \\n <h3> Background</h3>\\n \\n <p>Accurate classification of high-resolution chest x-ray (CXR) images is critical for diagnosing lung conditions such as pneumonia and identifying small lesion targets, which demands precise feature extraction from multi-scale anatomical structures. Traditional deep learning models face challenges in balancing local detail retention and global context modeling, particularly with limited labeled data and high computational costs for high-resolution inputs.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Purpose</h3>\\n \\n <p>This study introduces a multi-scale nested graph transformer (MNGT) to address these challenges, aiming to enhance classification accuracy for high-resolution CXR images while improving computational efficiency and generalization in data-constrained scenarios.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Methods</h3>\\n \\n <p>(1) Multi-scale nested architecture: High-resolution CXR images are segmented into hierarchical squares: first divided into large blocks, then further subdivided into smaller patches. A graph Transformer with variable attention scope processes these patches to capture local-to-global features, preserving fine details of small lesions (e.g., nodule contours) while modeling long-range dependencies (e.g., lung texture patterns); (2) Cross-Attention Fusion: Features from high-resolution and downscaled low-resolution images are fused using a cross-attention-based graph Transformer, enabling semantic interaction between scales and enhancing lesion discriminability; (3) Graph Pooling for Efficiency: Graph pooling aggregates patches into semantic regions, reducing token count and computational complexity (e.g., from 2401 to 196 tokens) while maintaining structural integrity; (4) Inductive Bias Integration: By incorporating graph convolution and adaptive receptive field adjustments, the model mitigates overfitting in small datasets, leveraging spatial prior knowledge to improve generalization.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Results</h3>\\n \\n <p>Through extensive experiments on three types of high-resolution CXR images, we demonstrate the superiority of our architecture, surpassing other models in terms of both accuracy and F1-score. Furthermore, our ablation study highlights the efficiency of our designed architecture. The code including comparative models are publicly available on the Website: GitHub/MNGT.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Conclusions</h3>\\n \\n <p>MNGT provides an efficient and robust solution for high-resolution CXR classification, combining local detail preservation, global context modeling, and inductive bias to excel in accuracy and generalization. The framework addresses the computational bottleneck of high-resolution medical imaging and offers a viable pathway for clinical deployment in computer-aided diagnosis.</p>\\n </section>\\n </div>\",\"PeriodicalId\":18384,\"journal\":{\"name\":\"Medical physics\",\"volume\":\"52 10\",\"pages\":\"\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2025-09-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Medical physics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://aapm.onlinelibrary.wiley.com/doi/10.1002/mp.70003\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical physics","FirstCategoryId":"3","ListUrlMain":"https://aapm.onlinelibrary.wiley.com/doi/10.1002/mp.70003","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
Accurate classification of high-resolution chest x-ray (CXR) images is critical for diagnosing lung conditions such as pneumonia and identifying small lesion targets, which demands precise feature extraction from multi-scale anatomical structures. Traditional deep learning models face challenges in balancing local detail retention and global context modeling, particularly with limited labeled data and high computational costs for high-resolution inputs.
Purpose
This study introduces a multi-scale nested graph transformer (MNGT) to address these challenges, aiming to enhance classification accuracy for high-resolution CXR images while improving computational efficiency and generalization in data-constrained scenarios.
Methods
(1) Multi-scale nested architecture: High-resolution CXR images are segmented into hierarchical squares: first divided into large blocks, then further subdivided into smaller patches. A graph Transformer with variable attention scope processes these patches to capture local-to-global features, preserving fine details of small lesions (e.g., nodule contours) while modeling long-range dependencies (e.g., lung texture patterns); (2) Cross-Attention Fusion: Features from high-resolution and downscaled low-resolution images are fused using a cross-attention-based graph Transformer, enabling semantic interaction between scales and enhancing lesion discriminability; (3) Graph Pooling for Efficiency: Graph pooling aggregates patches into semantic regions, reducing token count and computational complexity (e.g., from 2401 to 196 tokens) while maintaining structural integrity; (4) Inductive Bias Integration: By incorporating graph convolution and adaptive receptive field adjustments, the model mitigates overfitting in small datasets, leveraging spatial prior knowledge to improve generalization.
Results
Through extensive experiments on three types of high-resolution CXR images, we demonstrate the superiority of our architecture, surpassing other models in terms of both accuracy and F1-score. Furthermore, our ablation study highlights the efficiency of our designed architecture. The code including comparative models are publicly available on the Website: GitHub/MNGT.
Conclusions
MNGT provides an efficient and robust solution for high-resolution CXR classification, combining local detail preservation, global context modeling, and inductive bias to excel in accuracy and generalization. The framework addresses the computational bottleneck of high-resolution medical imaging and offers a viable pathway for clinical deployment in computer-aided diagnosis.
期刊介绍:
Medical Physics publishes original, high impact physics, imaging science, and engineering research that advances patient diagnosis and therapy through contributions in 1) Basic science developments with high potential for clinical translation 2) Clinical applications of cutting edge engineering and physics innovations 3) Broadly applicable and innovative clinical physics developments
Medical Physics is a journal of global scope and reach. By publishing in Medical Physics your research will reach an international, multidisciplinary audience including practicing medical physicists as well as physics- and engineering based translational scientists. We work closely with authors of promising articles to improve their quality.