Unified ARP-ViT-CNN system: Hybrid deep learning approach for segmenting and classifying multiple skin cancer lesions

IF 4.5 Q2 COMPUTER SCIENCE, THEORY & METHODS

Array Pub Date : 2025-09-20 DOI:10.1016/j.array.2025.100515

J.S. ThangaPurni , M. Braveen

{"title":"Unified ARP-ViT-CNN system: Hybrid deep learning approach for segmenting and classifying multiple skin cancer lesions","authors":"J.S. ThangaPurni , M. Braveen","doi":"10.1016/j.array.2025.100515","DOIUrl":null,"url":null,"abstract":"<div><div>Skin cancer continues to be one of the most difficult conditions to diagnose accurately, as different lesion types can look very similar, and the images used for diagnosis often contain noise and large variations. Many existing deep learning models, such as traditional Neural Networks and Transformer-based models, often struggle to capture both the fine-grained local details and the broader context of the image. To address these challenges, we developed a hybrid ARP-ViT-CNN model that employs a three-stream feature extraction process, where Angular Radial Partitioning (ARP) extracts geometric and structural patterns, Convolutional Neural Networks (CNN) extract subtle local features, and exploits Vision Transformers (ViT) to code long-range dependencies through self-attention mechanisms. Then, use this feature representation to develop a more advanced classification model that can differentiate several forms of skin malignancies. The Proposed model integrates a comprehensive Pre-processing pipeline, including balancing and enhancing the dataset, to address the problem of underrepresented classes and image variability. We evaluated our model on the publicly available HAM10000 dataset for multi-class skin cancer classification and segmentation. As anticipated, the ARP-ViT-CNN model beat the conventional CNN-only and Transformer-only models on the dataset, achieving an overall accuracy of 98.2% with a precision of 0.94, a recall of 0.96, and a macro-F1 score of 0.95. The ARP-ViT-CNN segmentation module for outlining lesion boundaries performed well, and was especially reliable in providing accurate boundary delineation in the presence of the complicated lesions in the test data. The overall results indicate that our ARP-ViT-CNN framework is effective, and with the ViT being suitable for global contextual learning, ARP being the mechanism by which we were making the model invariant to rotation and scale features, and this is likely contributing to the model’s tolerance to complex dermatological expressions. The ARP-ViT-CNN model has established itself as a modern approach to fully automated skin cancer imaging for automated diagnosis and a basis for future AI-powered medical imaging.</div></div>","PeriodicalId":8417,"journal":{"name":"Array","volume":"28 ","pages":"Article 100515"},"PeriodicalIF":4.5000,"publicationDate":"2025-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Array","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2590005625001420","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Skin cancer continues to be one of the most difficult conditions to diagnose accurately, as different lesion types can look very similar, and the images used for diagnosis often contain noise and large variations. Many existing deep learning models, such as traditional Neural Networks and Transformer-based models, often struggle to capture both the fine-grained local details and the broader context of the image. To address these challenges, we developed a hybrid ARP-ViT-CNN model that employs a three-stream feature extraction process, where Angular Radial Partitioning (ARP) extracts geometric and structural patterns, Convolutional Neural Networks (CNN) extract subtle local features, and exploits Vision Transformers (ViT) to code long-range dependencies through self-attention mechanisms. Then, use this feature representation to develop a more advanced classification model that can differentiate several forms of skin malignancies. The Proposed model integrates a comprehensive Pre-processing pipeline, including balancing and enhancing the dataset, to address the problem of underrepresented classes and image variability. We evaluated our model on the publicly available HAM10000 dataset for multi-class skin cancer classification and segmentation. As anticipated, the ARP-ViT-CNN model beat the conventional CNN-only and Transformer-only models on the dataset, achieving an overall accuracy of 98.2% with a precision of 0.94, a recall of 0.96, and a macro-F1 score of 0.95. The ARP-ViT-CNN segmentation module for outlining lesion boundaries performed well, and was especially reliable in providing accurate boundary delineation in the presence of the complicated lesions in the test data. The overall results indicate that our ARP-ViT-CNN framework is effective, and with the ViT being suitable for global contextual learning, ARP being the mechanism by which we were making the model invariant to rotation and scale features, and this is likely contributing to the model’s tolerance to complex dermatological expressions. The ARP-ViT-CNN model has established itself as a modern approach to fully automated skin cancer imaging for automated diagnosis and a basis for future AI-powered medical imaging.

查看原文本刊更多论文

统一的arp - viti - cnn系统：混合深度学习方法对多发性皮肤癌病变进行分割和分类

皮肤癌仍然是最难以准确诊断的疾病之一，因为不同的病变类型看起来非常相似，用于诊断的图像通常包含噪声和很大的差异。许多现有的深度学习模型，如传统的神经网络和基于transformer的模型，通常难以同时捕获细粒度的局部细节和图像的更广泛背景。为了解决这些挑战，我们开发了一种混合ARP-ViT-CNN模型，该模型采用三流特征提取过程，其中角径向划分（ARP）提取几何和结构模式，卷积神经网络（CNN）提取微妙的局部特征，并利用视觉变形（ViT）通过自关注机制编码远程依赖关系。然后，使用这个特征表示来开发一个更高级的分类模型，可以区分几种形式的皮肤恶性肿瘤。该模型集成了一个全面的预处理管道，包括平衡和增强数据集，以解决代表性不足的类别和图像可变性问题。我们在公开可用的HAM10000数据集上评估了我们的模型，用于多类别皮肤癌分类和分割。正如预期的那样，arp - viti - cnn模型在数据集上击败了传统的CNN-only和Transformer-only模型，总体准确率达到98.2%，精度为0.94，召回率为0.96，宏观f1得分为0.95。用于描绘病变边界的arp - viti - cnn分割模块表现良好，尤其在测试数据中存在复杂病变的情况下，能够提供准确的边界描绘。总体结果表明，我们的ARP- viti - cnn框架是有效的，并且ViT适用于全局上下文学习，ARP是我们使模型对旋转和尺度特征不变性的机制，这可能有助于模型对复杂皮肤病学表达的耐受性。arp - viti - cnn模型已成为全自动皮肤癌成像的现代方法，可用于自动诊断，并为未来人工智能医学成像奠定基础。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊