基于自监督学习的中医舌证多分类研究

2023 5th International Conference on Communications, Information System and Computer Engineering (CISCE) Pub Date : 2023-04-14 DOI:10.1109/CISCE58541.2023.10142351

ZhuLin Guo, Xuan Liu, Wenjian Liu

{"title":"基于自监督学习的中医舌证多分类研究","authors":"ZhuLin Guo, Xuan Liu, Wenjian Liu","doi":"10.1109/CISCE58541.2023.10142351","DOIUrl":null,"url":null,"abstract":"The advancement of computer technology has been progressing rapidly and the need for social life development, “Internet + health care” has become a necessary trend in human life, and the intelligent research of TCM (Traditional Chinese Medicine) tongue diagnosis is also one of the development directions. In recent years, neural networks have made significant progress in the field of medical image classification. However, neural networks often need to be built based on an extensive amount of data instances to achieve more precise results. In the medical field, collecting large and labeled datasets is costly. Considering this situation, we introduce a self-supervised Transformer framework, named SViT-T5, and build a masked image modeling task based on the Vision Transformer (ViT) architecture. The method treats restoring pixels as a proxy task for self-supervised learning, and image classification as a downstream task. Pre-train the model by using a great deal of unlabeled data and fine-tuning with the TongueNet dataset we created, five categories of tongue images are studied: cold syndrome, heat syndrome, deficiency syndrome, excess syndrome, and normal tongue image. And compare the classification results with the basic models without pre-training (ConvNeXt, ResNeXt, ViT, Swin Transformer) and pre-training on the ImageNet dataset. According to the results, the classification accuracy of our proposed SViT-T5 framework based on self-supervised learning reaches 85.938%. In contrast to other methods, our approach exhibits superior classification accuracy and generalization ability.","PeriodicalId":145263,"journal":{"name":"2023 5th International Conference on Communications, Information System and Computer Engineering (CISCE)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Research on Multi-classification of TCM Tongue Symptoms Based on Self-supervised Learning\",\"authors\":\"ZhuLin Guo, Xuan Liu, Wenjian Liu\",\"doi\":\"10.1109/CISCE58541.2023.10142351\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The advancement of computer technology has been progressing rapidly and the need for social life development, “Internet + health care” has become a necessary trend in human life, and the intelligent research of TCM (Traditional Chinese Medicine) tongue diagnosis is also one of the development directions. In recent years, neural networks have made significant progress in the field of medical image classification. However, neural networks often need to be built based on an extensive amount of data instances to achieve more precise results. In the medical field, collecting large and labeled datasets is costly. Considering this situation, we introduce a self-supervised Transformer framework, named SViT-T5, and build a masked image modeling task based on the Vision Transformer (ViT) architecture. The method treats restoring pixels as a proxy task for self-supervised learning, and image classification as a downstream task. Pre-train the model by using a great deal of unlabeled data and fine-tuning with the TongueNet dataset we created, five categories of tongue images are studied: cold syndrome, heat syndrome, deficiency syndrome, excess syndrome, and normal tongue image. And compare the classification results with the basic models without pre-training (ConvNeXt, ResNeXt, ViT, Swin Transformer) and pre-training on the ImageNet dataset. According to the results, the classification accuracy of our proposed SViT-T5 framework based on self-supervised learning reaches 85.938%. In contrast to other methods, our approach exhibits superior classification accuracy and generalization ability.\",\"PeriodicalId\":145263,\"journal\":{\"name\":\"2023 5th International Conference on Communications, Information System and Computer Engineering (CISCE)\",\"volume\":\"59 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-04-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 5th International Conference on Communications, Information System and Computer Engineering (CISCE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CISCE58541.2023.10142351\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 5th International Conference on Communications, Information System and Computer Engineering (CISCE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CISCE58541.2023.10142351","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

计算机技术的进步日新月异和社会生活发展的需要，“互联网+医疗”已成为人类生活的必然趋势，而中医舌诊的智能化研究也是发展方向之一。近年来，神经网络在医学图像分类领域取得了重大进展。然而，神经网络通常需要基于大量的数据实例来构建，以获得更精确的结果。在医疗领域，收集大型和标记的数据集是昂贵的。针对这种情况，我们引入了一个自监督的Transformer框架，命名为SViT-T5，并构建了一个基于Vision Transformer (ViT)架构的遮罩图像建模任务。该方法将恢复像素作为自监督学习的代理任务，将图像分类作为下游任务。利用大量未标记数据对模型进行预训练，并对我们创建的舌网数据集进行微调，研究了五类舌图像:寒证、热证、虚证、虚证和正常舌图像。并与未经预训练的基本模型(ConvNeXt、ResNeXt、ViT、Swin Transformer)和在ImageNet数据集上进行预训练的分类结果进行比较。结果表明，我们提出的基于自监督学习的svitt - t5框架的分类准确率达到85.938%。与其他方法相比，我们的方法具有更好的分类精度和泛化能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Research on Multi-classification of TCM Tongue Symptoms Based on Self-supervised Learning

The advancement of computer technology has been progressing rapidly and the need for social life development, “Internet + health care” has become a necessary trend in human life, and the intelligent research of TCM (Traditional Chinese Medicine) tongue diagnosis is also one of the development directions. In recent years, neural networks have made significant progress in the field of medical image classification. However, neural networks often need to be built based on an extensive amount of data instances to achieve more precise results. In the medical field, collecting large and labeled datasets is costly. Considering this situation, we introduce a self-supervised Transformer framework, named SViT-T5, and build a masked image modeling task based on the Vision Transformer (ViT) architecture. The method treats restoring pixels as a proxy task for self-supervised learning, and image classification as a downstream task. Pre-train the model by using a great deal of unlabeled data and fine-tuning with the TongueNet dataset we created, five categories of tongue images are studied: cold syndrome, heat syndrome, deficiency syndrome, excess syndrome, and normal tongue image. And compare the classification results with the basic models without pre-training (ConvNeXt, ResNeXt, ViT, Swin Transformer) and pre-training on the ImageNet dataset. According to the results, the classification accuracy of our proposed SViT-T5 framework based on self-supervised learning reaches 85.938%. In contrast to other methods, our approach exhibits superior classification accuracy and generalization ability.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2023 5th International Conference on Communications, Information System and Computer Engineering (CISCE)

自引率

0.00%

发文量