{"title":"Hierarchical vision transformer model for polyp segmentation","authors":"G. S, G. C., Vishnu Vinod","doi":"10.1109/AICAPS57044.2023.10074447","DOIUrl":null,"url":null,"abstract":"Medical image analysis plays a powerful role in clinical assistance for the diagnosis and treatment of diseases. Image segmentation is an essential part of the medical imaging process as it extracts the region of interest through semi-automated or automated methods. Deep learning approaches have emerged as a fast-growing research field in medical image analysis. Vision transformers (ViT) are deep learning models that came up as a competing substitute for convolutional neural networks. ViT reports breakthroughs in computer vision tasks including object classification, detection, localization, and segmentation. Colon polyp detection and segmentation is a challenging task in the medical diagnosis and prognosis of colorectal cancer. Early detection and segmentation of polyp regions are of the utmost importance in preventing disease in later stages. In this work, we explore a hierarchical vision transformer as the backbone, replacing convolutional neural networks (CNNs) for the segmentation of polyps. The hierarchical vision transformer is composed of several stages, each having a different resolution. Through the use of a convolutional decoder, the patches from various stages are successively combined to produce full pre-dictions. The transformer backbone has a global receptive field at every stage that provide finer-grained and globally relevant predictions. Experimental results indicate that we can fine-tune the architecture to generate promising results on segmentation metrics even on smaller datasets, with mean Dice and mean IoU scores of 74% and 73% on the Kvasir-SEG dataset.","PeriodicalId":146698,"journal":{"name":"2023 International Conference on Advances in Intelligent Computing and Applications (AICAPS)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference on Advances in Intelligent Computing and Applications (AICAPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AICAPS57044.2023.10074447","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Medical image analysis plays a powerful role in clinical assistance for the diagnosis and treatment of diseases. Image segmentation is an essential part of the medical imaging process as it extracts the region of interest through semi-automated or automated methods. Deep learning approaches have emerged as a fast-growing research field in medical image analysis. Vision transformers (ViT) are deep learning models that came up as a competing substitute for convolutional neural networks. ViT reports breakthroughs in computer vision tasks including object classification, detection, localization, and segmentation. Colon polyp detection and segmentation is a challenging task in the medical diagnosis and prognosis of colorectal cancer. Early detection and segmentation of polyp regions are of the utmost importance in preventing disease in later stages. In this work, we explore a hierarchical vision transformer as the backbone, replacing convolutional neural networks (CNNs) for the segmentation of polyps. The hierarchical vision transformer is composed of several stages, each having a different resolution. Through the use of a convolutional decoder, the patches from various stages are successively combined to produce full pre-dictions. The transformer backbone has a global receptive field at every stage that provide finer-grained and globally relevant predictions. Experimental results indicate that we can fine-tune the architecture to generate promising results on segmentation metrics even on smaller datasets, with mean Dice and mean IoU scores of 74% and 73% on the Kvasir-SEG dataset.