{"title":"利用时间亲和和扩散先验重构高质量原始视频。","authors":"Wencheng Han,Jianbing Shen,David J Crandall,Cheng-Zhong Xu","doi":"10.1109/tpami.2025.3596623","DOIUrl":null,"url":null,"abstract":"Due to the rich information and original data distribution, RAW data are widely used in many computer vision applications. However, the use of RAW video remains limited because of the high storage costs associated with data collection. Previous works have attempted to reconstruct RAW frames from sRGB data using small sampled metadata from the original RAW frames. Yet, these algorithms struggle with RAW video reconstruction due to the high computational cost of sampling metadata on cameras. To address these issues, we propose a new RAW video reconstruction pipeline that de-renders high-quality RAW videos from sRGB data using only one initial RAW frame as a reference. Specifically, we introduce three new models to achieve this goal. First, we present the Temporal-Affinity Guided De-rendering Network. This network leverages the temporal affinity between adjacent frames to construct a reference RAW image from previous RAW pixels. The corresponding RAW pixels in the previous frame provide valuable information about the original RAW data distribution, aiding in the precise reconstruction of the current frame. Second, to recover the missing RAW pixels caused by camera and foreground movement, we fully exploit the rich prior information from a pre-trained diffusion model and propose the RAW In-painting Model. This model can accurately fill in hollow regions in a RAW image based on the corresponding sRGB image and the surrounding RAW context. Lastly, we present a lightweight content-aware video clipper that automatically adjusts the clip length used for RAW video reconstruction, thereby balancing storage requirements with reconstruction quality. To better evaluate the performance of the proposed framework across different devices, we introduce the first RAW video reconstruction benchmark that comprises RAW videos from six types of camera devices with challenging scenarios. Experimental results demonstrate that our algorithm can accurately reconstruct RAW videos across all the scenarios. To facilitate further research, the code, pre-trained weight, dataset, and demo web will be publicly available at: https://um-lab.github.io/VideoRAW/.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"32 1","pages":""},"PeriodicalIF":18.6000,"publicationDate":"2025-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Reconstructing High Quality Raw Video Using Temporal Affinity and Diffusion Prior.\",\"authors\":\"Wencheng Han,Jianbing Shen,David J Crandall,Cheng-Zhong Xu\",\"doi\":\"10.1109/tpami.2025.3596623\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Due to the rich information and original data distribution, RAW data are widely used in many computer vision applications. However, the use of RAW video remains limited because of the high storage costs associated with data collection. Previous works have attempted to reconstruct RAW frames from sRGB data using small sampled metadata from the original RAW frames. Yet, these algorithms struggle with RAW video reconstruction due to the high computational cost of sampling metadata on cameras. To address these issues, we propose a new RAW video reconstruction pipeline that de-renders high-quality RAW videos from sRGB data using only one initial RAW frame as a reference. Specifically, we introduce three new models to achieve this goal. First, we present the Temporal-Affinity Guided De-rendering Network. This network leverages the temporal affinity between adjacent frames to construct a reference RAW image from previous RAW pixels. The corresponding RAW pixels in the previous frame provide valuable information about the original RAW data distribution, aiding in the precise reconstruction of the current frame. Second, to recover the missing RAW pixels caused by camera and foreground movement, we fully exploit the rich prior information from a pre-trained diffusion model and propose the RAW In-painting Model. This model can accurately fill in hollow regions in a RAW image based on the corresponding sRGB image and the surrounding RAW context. Lastly, we present a lightweight content-aware video clipper that automatically adjusts the clip length used for RAW video reconstruction, thereby balancing storage requirements with reconstruction quality. To better evaluate the performance of the proposed framework across different devices, we introduce the first RAW video reconstruction benchmark that comprises RAW videos from six types of camera devices with challenging scenarios. Experimental results demonstrate that our algorithm can accurately reconstruct RAW videos across all the scenarios. To facilitate further research, the code, pre-trained weight, dataset, and demo web will be publicly available at: https://um-lab.github.io/VideoRAW/.\",\"PeriodicalId\":13426,\"journal\":{\"name\":\"IEEE Transactions on Pattern Analysis and Machine Intelligence\",\"volume\":\"32 1\",\"pages\":\"\"},\"PeriodicalIF\":18.6000,\"publicationDate\":\"2025-08-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Pattern Analysis and Machine Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1109/tpami.2025.3596623\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Pattern Analysis and Machine Intelligence","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/tpami.2025.3596623","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Reconstructing High Quality Raw Video Using Temporal Affinity and Diffusion Prior.
Due to the rich information and original data distribution, RAW data are widely used in many computer vision applications. However, the use of RAW video remains limited because of the high storage costs associated with data collection. Previous works have attempted to reconstruct RAW frames from sRGB data using small sampled metadata from the original RAW frames. Yet, these algorithms struggle with RAW video reconstruction due to the high computational cost of sampling metadata on cameras. To address these issues, we propose a new RAW video reconstruction pipeline that de-renders high-quality RAW videos from sRGB data using only one initial RAW frame as a reference. Specifically, we introduce three new models to achieve this goal. First, we present the Temporal-Affinity Guided De-rendering Network. This network leverages the temporal affinity between adjacent frames to construct a reference RAW image from previous RAW pixels. The corresponding RAW pixels in the previous frame provide valuable information about the original RAW data distribution, aiding in the precise reconstruction of the current frame. Second, to recover the missing RAW pixels caused by camera and foreground movement, we fully exploit the rich prior information from a pre-trained diffusion model and propose the RAW In-painting Model. This model can accurately fill in hollow regions in a RAW image based on the corresponding sRGB image and the surrounding RAW context. Lastly, we present a lightweight content-aware video clipper that automatically adjusts the clip length used for RAW video reconstruction, thereby balancing storage requirements with reconstruction quality. To better evaluate the performance of the proposed framework across different devices, we introduce the first RAW video reconstruction benchmark that comprises RAW videos from six types of camera devices with challenging scenarios. Experimental results demonstrate that our algorithm can accurately reconstruct RAW videos across all the scenarios. To facilitate further research, the code, pre-trained weight, dataset, and demo web will be publicly available at: https://um-lab.github.io/VideoRAW/.
期刊介绍:
The IEEE Transactions on Pattern Analysis and Machine Intelligence publishes articles on all traditional areas of computer vision and image understanding, all traditional areas of pattern analysis and recognition, and selected areas of machine intelligence, with a particular emphasis on machine learning for pattern analysis. Areas such as techniques for visual search, document and handwriting analysis, medical image analysis, video and image sequence analysis, content-based retrieval of image and video, face and gesture recognition and relevant specialized hardware and/or software architectures are also covered.