{"title":"基于自监督学习的中医舌证多分类研究","authors":"ZhuLin Guo, Xuan Liu, Wenjian Liu","doi":"10.1109/CISCE58541.2023.10142351","DOIUrl":null,"url":null,"abstract":"The advancement of computer technology has been progressing rapidly and the need for social life development, “Internet + health care” has become a necessary trend in human life, and the intelligent research of TCM (Traditional Chinese Medicine) tongue diagnosis is also one of the development directions. In recent years, neural networks have made significant progress in the field of medical image classification. However, neural networks often need to be built based on an extensive amount of data instances to achieve more precise results. In the medical field, collecting large and labeled datasets is costly. Considering this situation, we introduce a self-supervised Transformer framework, named SViT-T5, and build a masked image modeling task based on the Vision Transformer (ViT) architecture. The method treats restoring pixels as a proxy task for self-supervised learning, and image classification as a downstream task. Pre-train the model by using a great deal of unlabeled data and fine-tuning with the TongueNet dataset we created, five categories of tongue images are studied: cold syndrome, heat syndrome, deficiency syndrome, excess syndrome, and normal tongue image. And compare the classification results with the basic models without pre-training (ConvNeXt, ResNeXt, ViT, Swin Transformer) and pre-training on the ImageNet dataset. According to the results, the classification accuracy of our proposed SViT-T5 framework based on self-supervised learning reaches 85.938%. In contrast to other methods, our approach exhibits superior classification accuracy and generalization ability.","PeriodicalId":145263,"journal":{"name":"2023 5th International Conference on Communications, Information System and Computer Engineering (CISCE)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Research on Multi-classification of TCM Tongue Symptoms Based on Self-supervised Learning\",\"authors\":\"ZhuLin Guo, Xuan Liu, Wenjian Liu\",\"doi\":\"10.1109/CISCE58541.2023.10142351\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The advancement of computer technology has been progressing rapidly and the need for social life development, “Internet + health care” has become a necessary trend in human life, and the intelligent research of TCM (Traditional Chinese Medicine) tongue diagnosis is also one of the development directions. In recent years, neural networks have made significant progress in the field of medical image classification. However, neural networks often need to be built based on an extensive amount of data instances to achieve more precise results. In the medical field, collecting large and labeled datasets is costly. Considering this situation, we introduce a self-supervised Transformer framework, named SViT-T5, and build a masked image modeling task based on the Vision Transformer (ViT) architecture. The method treats restoring pixels as a proxy task for self-supervised learning, and image classification as a downstream task. Pre-train the model by using a great deal of unlabeled data and fine-tuning with the TongueNet dataset we created, five categories of tongue images are studied: cold syndrome, heat syndrome, deficiency syndrome, excess syndrome, and normal tongue image. And compare the classification results with the basic models without pre-training (ConvNeXt, ResNeXt, ViT, Swin Transformer) and pre-training on the ImageNet dataset. According to the results, the classification accuracy of our proposed SViT-T5 framework based on self-supervised learning reaches 85.938%. In contrast to other methods, our approach exhibits superior classification accuracy and generalization ability.\",\"PeriodicalId\":145263,\"journal\":{\"name\":\"2023 5th International Conference on Communications, Information System and Computer Engineering (CISCE)\",\"volume\":\"59 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-04-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 5th International Conference on Communications, Information System and Computer Engineering (CISCE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CISCE58541.2023.10142351\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 5th International Conference on Communications, Information System and Computer Engineering (CISCE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CISCE58541.2023.10142351","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Research on Multi-classification of TCM Tongue Symptoms Based on Self-supervised Learning
The advancement of computer technology has been progressing rapidly and the need for social life development, “Internet + health care” has become a necessary trend in human life, and the intelligent research of TCM (Traditional Chinese Medicine) tongue diagnosis is also one of the development directions. In recent years, neural networks have made significant progress in the field of medical image classification. However, neural networks often need to be built based on an extensive amount of data instances to achieve more precise results. In the medical field, collecting large and labeled datasets is costly. Considering this situation, we introduce a self-supervised Transformer framework, named SViT-T5, and build a masked image modeling task based on the Vision Transformer (ViT) architecture. The method treats restoring pixels as a proxy task for self-supervised learning, and image classification as a downstream task. Pre-train the model by using a great deal of unlabeled data and fine-tuning with the TongueNet dataset we created, five categories of tongue images are studied: cold syndrome, heat syndrome, deficiency syndrome, excess syndrome, and normal tongue image. And compare the classification results with the basic models without pre-training (ConvNeXt, ResNeXt, ViT, Swin Transformer) and pre-training on the ImageNet dataset. According to the results, the classification accuracy of our proposed SViT-T5 framework based on self-supervised learning reaches 85.938%. In contrast to other methods, our approach exhibits superior classification accuracy and generalization ability.