{"title":"An Efficient Frequency Domain Vision Pipeline From RAW Images to Backend Tasks","authors":"Hao Li, Weiti Zhou, Xiangyu Zhang, Xin Lou","doi":"10.1109/ISCAS46773.2023.10182018","DOIUrl":null,"url":null,"abstract":"Though high resolution benefits computer vision performance, they are not commonly used in convolutional neural network (CNN)-based vision algorithms due to the limitation of memory and computation resource. Learning in the frequency domain makes high resolution images directly acceptable by CNNs, but the computation, time and energy overhead for pre-processing, including image signal processing (ISP) and domain transformation, can be large. This paper explores different image processing and domain transformation operations and proposes an efficient end-to-end frequency domain learning pipeline from RAW images to vision tasks. In particular, we simplify the pre-processing part by skipping the entire ISP pipeline and replacing the Discrete Cosine Transform (DCT) with a multiplication-free approximated one. Experimental results show that the final vision performance of the proposed pipeline is very close to that of the conventional pipeline, while significant amount of redundant operations can be saved.","PeriodicalId":177320,"journal":{"name":"2023 IEEE International Symposium on Circuits and Systems (ISCAS)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE International Symposium on Circuits and Systems (ISCAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCAS46773.2023.10182018","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Though high resolution benefits computer vision performance, they are not commonly used in convolutional neural network (CNN)-based vision algorithms due to the limitation of memory and computation resource. Learning in the frequency domain makes high resolution images directly acceptable by CNNs, but the computation, time and energy overhead for pre-processing, including image signal processing (ISP) and domain transformation, can be large. This paper explores different image processing and domain transformation operations and proposes an efficient end-to-end frequency domain learning pipeline from RAW images to vision tasks. In particular, we simplify the pre-processing part by skipping the entire ISP pipeline and replacing the Discrete Cosine Transform (DCT) with a multiplication-free approximated one. Experimental results show that the final vision performance of the proposed pipeline is very close to that of the conventional pipeline, while significant amount of redundant operations can be saved.