高分辨率，高压缩比快照压缩视频的可扩展编码

IF 13.7

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2025-06-19 DOI:10.1109/TIP.2025.3579208

Felipe Guzmán;Nelson Díaz;Bastián Romero;Esteban Vera

{"title":"高分辨率，高压缩比快照压缩视频的可扩展编码","authors":"Felipe Guzmán;Nelson Díaz;Bastián Romero;Esteban Vera","doi":"10.1109/TIP.2025.3579208","DOIUrl":null,"url":null,"abstract":"High-speed cameras are crucial for capturing fast events beyond human perception, although challenges in terms of storage, bandwidth, and cost hinder their widespread use. As an alternative, snapshot compressive video can overcome these challenges by exploiting the principles of compressed sensing to capture compressive projections of dynamic scenes into a single image, which is then used to recover the underlying video by solving an ill-posed inverse problem. However, scalability in terms of spatial and temporal resolution is limited for both acquisition and reconstruction. In this work, we leverage time-division multiplexing to design a versatile scalable coded aperture approach that allows unseen spatio-temporal scalability for snapshot compressive video, offering on-the-fly, high-compression ratios with minimal computational burden and low memory requirements. The proposed sampling scheme is universal and compatible with any compressive temporal imaging sampling matrices and reconstruction algorithm aimed for low spatio-temporal resolutions. Simulations validated with a series of experimental results confirm that we can compress up to 512 frames of 2K <inline-formula> <tex-math>$\\times 2$ </tex-math></inline-formula>K resolution into a single snapshot, equivalent to a compression ratio of 0.2%, delivering an overall reconstruction quality exceeding 30 dB in PSNR for conventional reconstruction algorithms, and often surpassing 36 dB when utilizing the latest state-of-the-art deep learning reconstruction algorithms. The results presented in this paper can be reproduced in the following GitHub repository: <uri>https://github.com/FOGuzman/All-scalable-CACTI</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"3960-3970"},"PeriodicalIF":13.7000,"publicationDate":"2025-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11040128","citationCount":"0","resultStr":"{\"title\":\"Scalable Coding for High-Resolution, High-Compression Ratio Snapshot Compressive Video\",\"authors\":\"Felipe Guzmán;Nelson Díaz;Bastián Romero;Esteban Vera\",\"doi\":\"10.1109/TIP.2025.3579208\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"High-speed cameras are crucial for capturing fast events beyond human perception, although challenges in terms of storage, bandwidth, and cost hinder their widespread use. As an alternative, snapshot compressive video can overcome these challenges by exploiting the principles of compressed sensing to capture compressive projections of dynamic scenes into a single image, which is then used to recover the underlying video by solving an ill-posed inverse problem. However, scalability in terms of spatial and temporal resolution is limited for both acquisition and reconstruction. In this work, we leverage time-division multiplexing to design a versatile scalable coded aperture approach that allows unseen spatio-temporal scalability for snapshot compressive video, offering on-the-fly, high-compression ratios with minimal computational burden and low memory requirements. The proposed sampling scheme is universal and compatible with any compressive temporal imaging sampling matrices and reconstruction algorithm aimed for low spatio-temporal resolutions. Simulations validated with a series of experimental results confirm that we can compress up to 512 frames of 2K <inline-formula> <tex-math>$\\\\times 2$ </tex-math></inline-formula>K resolution into a single snapshot, equivalent to a compression ratio of 0.2%, delivering an overall reconstruction quality exceeding 30 dB in PSNR for conventional reconstruction algorithms, and often surpassing 36 dB when utilizing the latest state-of-the-art deep learning reconstruction algorithms. The results presented in this paper can be reproduced in the following GitHub repository: <uri>https://github.com/FOGuzman/All-scalable-CACTI</uri>\",\"PeriodicalId\":94032,\"journal\":{\"name\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"volume\":\"34 \",\"pages\":\"3960-3970\"},\"PeriodicalIF\":13.7000,\"publicationDate\":\"2025-06-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11040128\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11040128/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11040128/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

高速摄像机对于捕捉超越人类感知的快速事件至关重要，尽管在存储、带宽和成本方面的挑战阻碍了它们的广泛使用。作为替代方案，快照压缩视频可以克服这些挑战，通过利用压缩感知原理将动态场景的压缩投影捕获到单个图像中，然后通过解决不适定逆问题来恢复底层视频。然而，在空间和时间分辨率方面的可扩展性在获取和重建方面都是有限的。在这项工作中，我们利用时分多路复用设计了一种通用的可扩展编码孔径方法，该方法允许快照压缩视频的无形时空可扩展性，以最小的计算负担和低内存要求提供动态、高压缩比。该采样方案具有通用性，可与任何压缩时间成像采样矩阵和低时空分辨率重构算法兼容。通过一系列实验结果验证的模拟结果证实，我们可以将高达512帧2K $\乘以2$ K分辨率压缩到单个快照中，相当于压缩比为0.2%，对于传统重建算法，提供的整体重建质量在PSNR上超过30 dB，而当使用最新的最先进的深度学习重建算法时，通常超过36 dB。本文给出的结果可以在以下GitHub存储库中复制：https://github.com/FOGuzman/All-scalable-CACTI

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Scalable Coding for High-Resolution, High-Compression Ratio Snapshot Compressive Video

High-speed cameras are crucial for capturing fast events beyond human perception, although challenges in terms of storage, bandwidth, and cost hinder their widespread use. As an alternative, snapshot compressive video can overcome these challenges by exploiting the principles of compressed sensing to capture compressive projections of dynamic scenes into a single image, which is then used to recover the underlying video by solving an ill-posed inverse problem. However, scalability in terms of spatial and temporal resolution is limited for both acquisition and reconstruction. In this work, we leverage time-division multiplexing to design a versatile scalable coded aperture approach that allows unseen spatio-temporal scalability for snapshot compressive video, offering on-the-fly, high-compression ratios with minimal computational burden and low memory requirements. The proposed sampling scheme is universal and compatible with any compressive temporal imaging sampling matrices and reconstruction algorithm aimed for low spatio-temporal resolutions. Simulations validated with a series of experimental results confirm that we can compress up to 512 frames of 2K

$\times 2$

K resolution into a single snapshot, equivalent to a compression ratio of 0.2%, delivering an overall reconstruction quality exceeding 30 dB in PSNR for conventional reconstruction algorithms, and often surpassing 36 dB when utilizing the latest state-of-the-art deep learning reconstruction algorithms. The results presented in this paper can be reproduced in the following GitHub repository: https://github.com/FOGuzman/All-scalable-CACTI

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

自引率

0.00%

发文量