{"title":"从原始图像到后端任务的高效频域视觉管道","authors":"Hao Li, Weiti Zhou, Xiangyu Zhang, Xin Lou","doi":"10.1109/ISCAS46773.2023.10182018","DOIUrl":null,"url":null,"abstract":"Though high resolution benefits computer vision performance, they are not commonly used in convolutional neural network (CNN)-based vision algorithms due to the limitation of memory and computation resource. Learning in the frequency domain makes high resolution images directly acceptable by CNNs, but the computation, time and energy overhead for pre-processing, including image signal processing (ISP) and domain transformation, can be large. This paper explores different image processing and domain transformation operations and proposes an efficient end-to-end frequency domain learning pipeline from RAW images to vision tasks. In particular, we simplify the pre-processing part by skipping the entire ISP pipeline and replacing the Discrete Cosine Transform (DCT) with a multiplication-free approximated one. Experimental results show that the final vision performance of the proposed pipeline is very close to that of the conventional pipeline, while significant amount of redundant operations can be saved.","PeriodicalId":177320,"journal":{"name":"2023 IEEE International Symposium on Circuits and Systems (ISCAS)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Efficient Frequency Domain Vision Pipeline From RAW Images to Backend Tasks\",\"authors\":\"Hao Li, Weiti Zhou, Xiangyu Zhang, Xin Lou\",\"doi\":\"10.1109/ISCAS46773.2023.10182018\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Though high resolution benefits computer vision performance, they are not commonly used in convolutional neural network (CNN)-based vision algorithms due to the limitation of memory and computation resource. Learning in the frequency domain makes high resolution images directly acceptable by CNNs, but the computation, time and energy overhead for pre-processing, including image signal processing (ISP) and domain transformation, can be large. This paper explores different image processing and domain transformation operations and proposes an efficient end-to-end frequency domain learning pipeline from RAW images to vision tasks. In particular, we simplify the pre-processing part by skipping the entire ISP pipeline and replacing the Discrete Cosine Transform (DCT) with a multiplication-free approximated one. Experimental results show that the final vision performance of the proposed pipeline is very close to that of the conventional pipeline, while significant amount of redundant operations can be saved.\",\"PeriodicalId\":177320,\"journal\":{\"name\":\"2023 IEEE International Symposium on Circuits and Systems (ISCAS)\",\"volume\":\"33 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-05-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE International Symposium on Circuits and Systems (ISCAS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISCAS46773.2023.10182018\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE International Symposium on Circuits and Systems (ISCAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCAS46773.2023.10182018","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An Efficient Frequency Domain Vision Pipeline From RAW Images to Backend Tasks
Though high resolution benefits computer vision performance, they are not commonly used in convolutional neural network (CNN)-based vision algorithms due to the limitation of memory and computation resource. Learning in the frequency domain makes high resolution images directly acceptable by CNNs, but the computation, time and energy overhead for pre-processing, including image signal processing (ISP) and domain transformation, can be large. This paper explores different image processing and domain transformation operations and proposes an efficient end-to-end frequency domain learning pipeline from RAW images to vision tasks. In particular, we simplify the pre-processing part by skipping the entire ISP pipeline and replacing the Discrete Cosine Transform (DCT) with a multiplication-free approximated one. Experimental results show that the final vision performance of the proposed pipeline is very close to that of the conventional pipeline, while significant amount of redundant operations can be saved.