{"title":"利用多尺度金字塔变换器实现端到端图像着色","authors":"Tongtong Zhao;Gehui Li;Shanshan Zhao","doi":"10.1109/TMM.2024.3453035","DOIUrl":null,"url":null,"abstract":"Image colorization is a challenging task due to its ill-posed and multimodal nature, leading to unsatisfactory results in traditional approaches that rely on reference images or user guides. Although deep learning-based methods have been proposed, they may not be sufficient due to the lack of semantic understanding. To overcome this limitation, we present an innovative end-to-end automatic colorization method that does not require any color reference images and achieves superior quantitative and qualitative results compared to state-of-the-art methods. Our approach incorporates a Multiscale Pyramid Transformer that captures both local and global contextual information and a novel attention module called Dual-Attention, which replaces the traditional Window Attention and Channel Attention with faster and lighter Separable Dilated Attention and Factorized Channel Attention. Additionally, we introduce a new color decoder called Color-Attention, which learns colorization patterns from grayscale images and color images of the current training set, resulting in improved generalizability and eliminating the need for constructing color priors. Experimental results demonstrate the effectiveness of our approach in various benchmark datasets, including high-level computer vision tasks such as classification, segmentation, and detection. Our method offers robustness, generalization ability, and improved colorization quality, making it a valuable contribution to the field of image colorization.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"26 ","pages":"11332-11344"},"PeriodicalIF":8.4000,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"End-to-End Image Colorization With Multiscale Pyramid Transformer\",\"authors\":\"Tongtong Zhao;Gehui Li;Shanshan Zhao\",\"doi\":\"10.1109/TMM.2024.3453035\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Image colorization is a challenging task due to its ill-posed and multimodal nature, leading to unsatisfactory results in traditional approaches that rely on reference images or user guides. Although deep learning-based methods have been proposed, they may not be sufficient due to the lack of semantic understanding. To overcome this limitation, we present an innovative end-to-end automatic colorization method that does not require any color reference images and achieves superior quantitative and qualitative results compared to state-of-the-art methods. Our approach incorporates a Multiscale Pyramid Transformer that captures both local and global contextual information and a novel attention module called Dual-Attention, which replaces the traditional Window Attention and Channel Attention with faster and lighter Separable Dilated Attention and Factorized Channel Attention. Additionally, we introduce a new color decoder called Color-Attention, which learns colorization patterns from grayscale images and color images of the current training set, resulting in improved generalizability and eliminating the need for constructing color priors. Experimental results demonstrate the effectiveness of our approach in various benchmark datasets, including high-level computer vision tasks such as classification, segmentation, and detection. Our method offers robustness, generalization ability, and improved colorization quality, making it a valuable contribution to the field of image colorization.\",\"PeriodicalId\":13273,\"journal\":{\"name\":\"IEEE Transactions on Multimedia\",\"volume\":\"26 \",\"pages\":\"11332-11344\"},\"PeriodicalIF\":8.4000,\"publicationDate\":\"2024-09-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Multimedia\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10663257/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multimedia","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10663257/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
End-to-End Image Colorization With Multiscale Pyramid Transformer
Image colorization is a challenging task due to its ill-posed and multimodal nature, leading to unsatisfactory results in traditional approaches that rely on reference images or user guides. Although deep learning-based methods have been proposed, they may not be sufficient due to the lack of semantic understanding. To overcome this limitation, we present an innovative end-to-end automatic colorization method that does not require any color reference images and achieves superior quantitative and qualitative results compared to state-of-the-art methods. Our approach incorporates a Multiscale Pyramid Transformer that captures both local and global contextual information and a novel attention module called Dual-Attention, which replaces the traditional Window Attention and Channel Attention with faster and lighter Separable Dilated Attention and Factorized Channel Attention. Additionally, we introduce a new color decoder called Color-Attention, which learns colorization patterns from grayscale images and color images of the current training set, resulting in improved generalizability and eliminating the need for constructing color priors. Experimental results demonstrate the effectiveness of our approach in various benchmark datasets, including high-level computer vision tasks such as classification, segmentation, and detection. Our method offers robustness, generalization ability, and improved colorization quality, making it a valuable contribution to the field of image colorization.
期刊介绍:
The IEEE Transactions on Multimedia delves into diverse aspects of multimedia technology and applications, covering circuits, networking, signal processing, systems, software, and systems integration. The scope aligns with the Fields of Interest of the sponsors, ensuring a comprehensive exploration of research in multimedia.