Woojin Ahn, G. Yang, H. Choi, M. Lim, Tae-Koo Kang
{"title":"Improving Vision Transformer with Multi-Task Training","authors":"Woojin Ahn, G. Yang, H. Choi, M. Lim, Tae-Koo Kang","doi":"10.23919/ICCAS55662.2022.10003833","DOIUrl":null,"url":null,"abstract":"Self-supervised learning methods have shown excellent performance in improving the performance of existing networks by learning visual representations from large amounts of unlabeled data. In this paper, we propose a end-to-end multi-task self-supervision method for vision transformer. The network is given two task: inpainting, position prediction. Given a masked image, the network predicts the missing pixel information and also predicts the position of the given puzzle patches. Through classification experiment, we demonstrate that the proposed method improves performance of the network compared to the direct supervised learning method.","PeriodicalId":129856,"journal":{"name":"2022 22nd International Conference on Control, Automation and Systems (ICCAS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 22nd International Conference on Control, Automation and Systems (ICCAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/ICCAS55662.2022.10003833","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Self-supervised learning methods have shown excellent performance in improving the performance of existing networks by learning visual representations from large amounts of unlabeled data. In this paper, we propose a end-to-end multi-task self-supervision method for vision transformer. The network is given two task: inpainting, position prediction. Given a masked image, the network predicts the missing pixel information and also predicts the position of the given puzzle patches. Through classification experiment, we demonstrate that the proposed method improves performance of the network compared to the direct supervised learning method.