{"title":"Hard-UNet architecture for medical image segmentation using position encoding generator: LSA based encoder","authors":"Chia-Jui Chen","doi":"10.1016/j.jvcir.2025.104452","DOIUrl":null,"url":null,"abstract":"<div><div>Researchers have focused on the rising usage of convolutional neural networks (CNNs) in segmentation, emphasizing the pivotal role of encoders in learning global and local information essential for predictions. The limited ability of CNNs to capture distant spatial relationships due to their local structure has spurred interest in the swin-transformer. Introducing a novel approach named Hard-UNet, blending CNNs and transformers, addresses this gap, inspired by transformer successes in NLP. Hard-UNet leverages HardNet for deep feature extraction and implements a transformer-based module for self-communication within sub-windows. Experimental results demonstrate its significant performance leap over existing methods, notably enhancing segmentation accuracy on medical image datasets like ISIC 2018 and BUSI. Outperforming UNext and ResUNet, Hard-UNet delivers a remarkable 16.24% enhancement in segmentation accuracy, achieving state-of-the-art results of 83.19 % and 83.26 % on the ISIC dataset.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"110 ","pages":"Article 104452"},"PeriodicalIF":2.6000,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Visual Communication and Image Representation","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1047320325000665","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Researchers have focused on the rising usage of convolutional neural networks (CNNs) in segmentation, emphasizing the pivotal role of encoders in learning global and local information essential for predictions. The limited ability of CNNs to capture distant spatial relationships due to their local structure has spurred interest in the swin-transformer. Introducing a novel approach named Hard-UNet, blending CNNs and transformers, addresses this gap, inspired by transformer successes in NLP. Hard-UNet leverages HardNet for deep feature extraction and implements a transformer-based module for self-communication within sub-windows. Experimental results demonstrate its significant performance leap over existing methods, notably enhancing segmentation accuracy on medical image datasets like ISIC 2018 and BUSI. Outperforming UNext and ResUNet, Hard-UNet delivers a remarkable 16.24% enhancement in segmentation accuracy, achieving state-of-the-art results of 83.19 % and 83.26 % on the ISIC dataset.
期刊介绍:
The Journal of Visual Communication and Image Representation publishes papers on state-of-the-art visual communication and image representation, with emphasis on novel technologies and theoretical work in this multidisciplinary area of pure and applied research. The field of visual communication and image representation is considered in its broadest sense and covers both digital and analog aspects as well as processing and communication in biological visual systems.