{"title":"基于深度学习的宫颈癌分类方法(使用 3D CNN 和视觉变换器","authors":"Abinaya K., Sivakumar B.","doi":"10.1007/s10278-023-00911-z","DOIUrl":null,"url":null,"abstract":"<p>Cervical cancer is a significant health problem worldwide, and early detection and treatment are critical to improving patient outcomes. To address this challenge, a deep learning (DL)-based cervical classification system is proposed using 3D convolutional neural network and Vision Transformer (ViT) module. The proposed model leverages the capability of 3D CNN to extract spatiotemporal features from cervical images and employs the ViT model to capture and learn complex feature representations. The model consists of an input layer that receives cervical images, followed by a 3D convolution block, which extracts features from the images. The feature maps generated are down-sampled using max-pooling block to eliminate redundant information and preserve important features. Four Vision Transformer models are employed to extract efficient feature maps of different levels of abstraction. The output of each Vision Transformer model is an efficient set of feature maps that captures spatiotemporal information at a specific level of abstraction. The feature maps generated by the Vision Transformer models are then supplied into the 3D feature pyramid network (FPN) module for feature concatenation. The 3D squeeze-and-excitation (SE) block is employed to obtain efficient feature maps that recalibrate the feature responses of the network based on the interdependencies between different feature maps, thereby improving the discriminative power of the model. At last, dimension minimization of feature maps is executed using 3D average pooling layer. Its output is then fed into a kernel extreme learning machine (KELM) for classification into one of the five classes. The KELM uses radial basis kernel function (RBF) for mapping features in high-dimensional feature space and classifying the input samples. The superiority of the proposed model is known using simulation results, achieving an accuracy of 98.6%, demonstrating its potential as an effective tool for cervical cancer classification. Also, it can be used as a diagnostic supportive tool to assist medical experts in accurately identifying cervical cancer in patients.</p>","PeriodicalId":50214,"journal":{"name":"Journal of Digital Imaging","volume":null,"pages":null},"PeriodicalIF":2.9000,"publicationDate":"2024-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Deep Learning-Based Approach for Cervical Cancer Classification Using 3D CNN and Vision Transformer\",\"authors\":\"Abinaya K., Sivakumar B.\",\"doi\":\"10.1007/s10278-023-00911-z\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Cervical cancer is a significant health problem worldwide, and early detection and treatment are critical to improving patient outcomes. To address this challenge, a deep learning (DL)-based cervical classification system is proposed using 3D convolutional neural network and Vision Transformer (ViT) module. The proposed model leverages the capability of 3D CNN to extract spatiotemporal features from cervical images and employs the ViT model to capture and learn complex feature representations. The model consists of an input layer that receives cervical images, followed by a 3D convolution block, which extracts features from the images. The feature maps generated are down-sampled using max-pooling block to eliminate redundant information and preserve important features. Four Vision Transformer models are employed to extract efficient feature maps of different levels of abstraction. The output of each Vision Transformer model is an efficient set of feature maps that captures spatiotemporal information at a specific level of abstraction. The feature maps generated by the Vision Transformer models are then supplied into the 3D feature pyramid network (FPN) module for feature concatenation. The 3D squeeze-and-excitation (SE) block is employed to obtain efficient feature maps that recalibrate the feature responses of the network based on the interdependencies between different feature maps, thereby improving the discriminative power of the model. At last, dimension minimization of feature maps is executed using 3D average pooling layer. Its output is then fed into a kernel extreme learning machine (KELM) for classification into one of the five classes. The KELM uses radial basis kernel function (RBF) for mapping features in high-dimensional feature space and classifying the input samples. The superiority of the proposed model is known using simulation results, achieving an accuracy of 98.6%, demonstrating its potential as an effective tool for cervical cancer classification. Also, it can be used as a diagnostic supportive tool to assist medical experts in accurately identifying cervical cancer in patients.</p>\",\"PeriodicalId\":50214,\"journal\":{\"name\":\"Journal of Digital Imaging\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2024-01-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Digital Imaging\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://doi.org/10.1007/s10278-023-00911-z\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Digital Imaging","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1007/s10278-023-00911-z","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0
摘要
宫颈癌是全球范围内的重大健康问题,早期检测和治疗对改善患者预后至关重要。为了应对这一挑战,我们利用三维卷积神经网络和视觉转换器(ViT)模块,提出了一种基于深度学习(DL)的宫颈癌分类系统。该模型利用三维卷积神经网络的能力从宫颈图像中提取时空特征,并采用 ViT 模型捕捉和学习复杂的特征表征。该模型由接收宫颈图像的输入层和从图像中提取特征的三维卷积块组成。生成的特征图使用最大池化块进行下采样,以消除冗余信息并保留重要特征。四种视觉变换器模型用于提取不同抽象程度的高效特征图。每个视觉转换器模型的输出都是一组高效的特征图,可捕捉特定抽象层次的时空信息。然后,由视觉转换器模型生成的特征图将被输入三维特征金字塔网络(FPN)模块进行特征串联。三维挤压激励(SE)模块用于获取高效的特征图,根据不同特征图之间的相互依存关系重新校准网络的特征响应,从而提高模型的判别能力。最后,利用三维平均池化层对特征图进行维度最小化处理。然后,将其输出输入内核极端学习机(KELM),将其分类为五个类别之一。KELM 使用径向基核函数(RBF)在高维特征空间中映射特征,并对输入样本进行分类。模拟结果表明,该模型的准确率高达 98.6%,显示了其作为宫颈癌分类有效工具的潜力。此外,它还可用作辅助诊断工具,协助医学专家准确识别宫颈癌患者。
A Deep Learning-Based Approach for Cervical Cancer Classification Using 3D CNN and Vision Transformer
Cervical cancer is a significant health problem worldwide, and early detection and treatment are critical to improving patient outcomes. To address this challenge, a deep learning (DL)-based cervical classification system is proposed using 3D convolutional neural network and Vision Transformer (ViT) module. The proposed model leverages the capability of 3D CNN to extract spatiotemporal features from cervical images and employs the ViT model to capture and learn complex feature representations. The model consists of an input layer that receives cervical images, followed by a 3D convolution block, which extracts features from the images. The feature maps generated are down-sampled using max-pooling block to eliminate redundant information and preserve important features. Four Vision Transformer models are employed to extract efficient feature maps of different levels of abstraction. The output of each Vision Transformer model is an efficient set of feature maps that captures spatiotemporal information at a specific level of abstraction. The feature maps generated by the Vision Transformer models are then supplied into the 3D feature pyramid network (FPN) module for feature concatenation. The 3D squeeze-and-excitation (SE) block is employed to obtain efficient feature maps that recalibrate the feature responses of the network based on the interdependencies between different feature maps, thereby improving the discriminative power of the model. At last, dimension minimization of feature maps is executed using 3D average pooling layer. Its output is then fed into a kernel extreme learning machine (KELM) for classification into one of the five classes. The KELM uses radial basis kernel function (RBF) for mapping features in high-dimensional feature space and classifying the input samples. The superiority of the proposed model is known using simulation results, achieving an accuracy of 98.6%, demonstrating its potential as an effective tool for cervical cancer classification. Also, it can be used as a diagnostic supportive tool to assist medical experts in accurately identifying cervical cancer in patients.
期刊介绍:
The Journal of Digital Imaging (JDI) is the official peer-reviewed journal of the Society for Imaging Informatics in Medicine (SIIM). JDI’s goal is to enhance the exchange of knowledge encompassed by the general topic of Imaging Informatics in Medicine such as research and practice in clinical, engineering, and information technologies and techniques in all medical imaging environments. JDI topics are of interest to researchers, developers, educators, physicians, and imaging informatics professionals.
Suggested Topics
PACS and component systems; imaging informatics for the enterprise; image-enabled electronic medical records; RIS and HIS; digital image acquisition; image processing; image data compression; 3D, visualization, and multimedia; speech recognition; computer-aided diagnosis; facilities design; imaging vocabularies and ontologies; Transforming the Radiological Interpretation Process (TRIP™); DICOM and other standards; workflow and process modeling and simulation; quality assurance; archive integrity and security; teleradiology; digital mammography; and radiological informatics education.