{"title":"MFCT: Multi-Frequency Cascade Transformers for no-reference SR-IQA","authors":"","doi":"10.1016/j.cviu.2024.104104","DOIUrl":null,"url":null,"abstract":"<div><p>Super-resolution image reconstruction techniques have advanced quickly, leading to the generation of a sizable number of super-resolution images using different super-resolution techniques. Nevertheless, accurately assessing the quality of super-resolution images remains a formidable challenge. This paper introduces a novel Multi-Frequency Cascade Transformers (MFCT) for evaluating super-resolution image quality (SR-IQA). In the first step, we develop a unique Frequency-Divided Module (FDM) to transform the super-resolution images into three different frequency bands. Subsequently, the Cascade Transformer Blocks (CAF) incorporating hierarchical self-attention mechanisms are employed to capture cross-window features for quality perception. Ultimately, the image quality scores from different frequency bands are fused to derive the overall image quality score. The experimental results show that, on the chosen SR-IQA databases, the proposed MFCT-based SR-IQA method can consistently outperforms all the compared Image Quality Assessment (IQA) models. Furthermore, a collection of thorough ablation studies demonstrates that, when compared to other earlier rivals, the newly proposed approach exhibits impressive generalization ability. The code will be available at <span><span>https://github.com/kbzhang0505/MFCT</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":null,"pages":null},"PeriodicalIF":4.3000,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Vision and Image Understanding","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1077314224001851","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Super-resolution image reconstruction techniques have advanced quickly, leading to the generation of a sizable number of super-resolution images using different super-resolution techniques. Nevertheless, accurately assessing the quality of super-resolution images remains a formidable challenge. This paper introduces a novel Multi-Frequency Cascade Transformers (MFCT) for evaluating super-resolution image quality (SR-IQA). In the first step, we develop a unique Frequency-Divided Module (FDM) to transform the super-resolution images into three different frequency bands. Subsequently, the Cascade Transformer Blocks (CAF) incorporating hierarchical self-attention mechanisms are employed to capture cross-window features for quality perception. Ultimately, the image quality scores from different frequency bands are fused to derive the overall image quality score. The experimental results show that, on the chosen SR-IQA databases, the proposed MFCT-based SR-IQA method can consistently outperforms all the compared Image Quality Assessment (IQA) models. Furthermore, a collection of thorough ablation studies demonstrates that, when compared to other earlier rivals, the newly proposed approach exhibits impressive generalization ability. The code will be available at https://github.com/kbzhang0505/MFCT.
期刊介绍:
The central focus of this journal is the computer analysis of pictorial information. Computer Vision and Image Understanding publishes papers covering all aspects of image analysis from the low-level, iconic processes of early vision to the high-level, symbolic processes of recognition and interpretation. A wide range of topics in the image understanding area is covered, including papers offering insights that differ from predominant views.
Research Areas Include:
• Theory
• Early vision
• Data structures and representations
• Shape
• Range
• Motion
• Matching and recognition
• Architecture and languages
• Vision systems