{"title":"临床知识增强医学图像分类","authors":"Zhikang Xu , Jiye Liang , Zhipeng Wei , Xiaodong Yue , Deyu Li","doi":"10.1016/j.patcog.2025.112414","DOIUrl":null,"url":null,"abstract":"<div><div>Due to the scarcity of data in medical field, deep learning-based medical image classification faces challenges in both accuracy and reliability. Foundation models (FMs) provide a promising enhancement strategy by extracting the text medical knowledge embeddings from FMs and use it to guide the specific classification model. However, the clinical knowledge is generally structurized, and the use of pure text as knowledge representation may not be significant enough for enhancing downstream model. Moreover, the lesion areas are generally subtle, combining FMs to downstream model in a coarse-grained manner still faces challenge in precisely attending the lesions. To tackle these challenges, we propose a novel medical image classification model that effectively embeds clinical knowledge through combining graphs and FMs. First, we represent the clinical rules as graphs, where the node describes the critical characteristics of disease. During training, we use FMs to extract the embeddings of node text description, and use graph transformer to extract global representation of graphs. By employing vision transformer to encode input images, we propose a global-local alignment module to transfer clinical knowledge where the embeddings of image branch and graph branch are aligned from image-to-graph level and patch-to-vertex level, respectively. Moreover, we propose a dynamic image patch selection method to reduce the attention of the model to irrelevant and noisy regions. Experimental results on bladder tumor classification dataset verifies that even with limited training data, the proposed method can not only achieve SOTA performance, but also accurately attend the lesion areas, thus improving the trustworthiness.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112414"},"PeriodicalIF":7.6000,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Clinical knowledge enhanced medical image classification\",\"authors\":\"Zhikang Xu , Jiye Liang , Zhipeng Wei , Xiaodong Yue , Deyu Li\",\"doi\":\"10.1016/j.patcog.2025.112414\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Due to the scarcity of data in medical field, deep learning-based medical image classification faces challenges in both accuracy and reliability. Foundation models (FMs) provide a promising enhancement strategy by extracting the text medical knowledge embeddings from FMs and use it to guide the specific classification model. However, the clinical knowledge is generally structurized, and the use of pure text as knowledge representation may not be significant enough for enhancing downstream model. Moreover, the lesion areas are generally subtle, combining FMs to downstream model in a coarse-grained manner still faces challenge in precisely attending the lesions. To tackle these challenges, we propose a novel medical image classification model that effectively embeds clinical knowledge through combining graphs and FMs. First, we represent the clinical rules as graphs, where the node describes the critical characteristics of disease. During training, we use FMs to extract the embeddings of node text description, and use graph transformer to extract global representation of graphs. By employing vision transformer to encode input images, we propose a global-local alignment module to transfer clinical knowledge where the embeddings of image branch and graph branch are aligned from image-to-graph level and patch-to-vertex level, respectively. Moreover, we propose a dynamic image patch selection method to reduce the attention of the model to irrelevant and noisy regions. Experimental results on bladder tumor classification dataset verifies that even with limited training data, the proposed method can not only achieve SOTA performance, but also accurately attend the lesion areas, thus improving the trustworthiness.</div></div>\",\"PeriodicalId\":49713,\"journal\":{\"name\":\"Pattern Recognition\",\"volume\":\"172 \",\"pages\":\"Article 112414\"},\"PeriodicalIF\":7.6000,\"publicationDate\":\"2025-09-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Pattern Recognition\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0031320325010751\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320325010751","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Clinical knowledge enhanced medical image classification
Due to the scarcity of data in medical field, deep learning-based medical image classification faces challenges in both accuracy and reliability. Foundation models (FMs) provide a promising enhancement strategy by extracting the text medical knowledge embeddings from FMs and use it to guide the specific classification model. However, the clinical knowledge is generally structurized, and the use of pure text as knowledge representation may not be significant enough for enhancing downstream model. Moreover, the lesion areas are generally subtle, combining FMs to downstream model in a coarse-grained manner still faces challenge in precisely attending the lesions. To tackle these challenges, we propose a novel medical image classification model that effectively embeds clinical knowledge through combining graphs and FMs. First, we represent the clinical rules as graphs, where the node describes the critical characteristics of disease. During training, we use FMs to extract the embeddings of node text description, and use graph transformer to extract global representation of graphs. By employing vision transformer to encode input images, we propose a global-local alignment module to transfer clinical knowledge where the embeddings of image branch and graph branch are aligned from image-to-graph level and patch-to-vertex level, respectively. Moreover, we propose a dynamic image patch selection method to reduce the attention of the model to irrelevant and noisy regions. Experimental results on bladder tumor classification dataset verifies that even with limited training data, the proposed method can not only achieve SOTA performance, but also accurately attend the lesion areas, thus improving the trustworthiness.
期刊介绍:
The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.