An unsupervised video stabilization algorithm based on gyroscope image fusion

IF 2.5 4区计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computers & Graphics-Uk Pub Date : 2025-02-01 DOI:10.1016/j.cag.2024.104154

Zhengwei Ren , Mingrui Zou , Lin Bi , Ming Fang

{"title":"An unsupervised video stabilization algorithm based on gyroscope image fusion","authors":"Zhengwei Ren , Mingrui Zou , Lin Bi , Ming Fang","doi":"10.1016/j.cag.2024.104154","DOIUrl":null,"url":null,"abstract":"<div><div>Video stabilization aims to enhance the visual quality by reducing jitter and ghosting artifacts caused by camera shaking, yet effectively stabilizing low-quality videos and from complex scenarios remains a significant challenge. While gyroscope-based approaches can address this issue, they struggle with depth variations and translational shaking. In this paper, we propose a coarse-to-fine, unsupervised deep learning video stabilization solution that integrates image and gyroscope data to address these challenges. Our approach excels in stabilizing videos under diverse conditions, managing depth changes, and handling both translational and rotational motion. We utilize gyroscope data to estimate the 3D camera rotation and apply LSTM to predict stable poses. Grid-based motion parameters address depth-related motion, generating a multi-grid warping field that mitigates the significant image jitter caused by camera rotation. Subsequently, we achieve the elimination of residual motion at the pixel level. PDCNet is used to generated confidence maps filter optical flow to minimize disturbances from prominent local areas, while a U-Net architecture smooths the optical flow, performing pixel-level warping to generating finely stabilized frames. Comparative analysis shows that our approach surpasses state-of-the-art methods, particularly in handling complex scenes and achieving stability in challenging conditions.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"126 ","pages":"Article 104154"},"PeriodicalIF":2.5000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Graphics-Uk","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0097849324002899","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Video stabilization aims to enhance the visual quality by reducing jitter and ghosting artifacts caused by camera shaking, yet effectively stabilizing low-quality videos and from complex scenarios remains a significant challenge. While gyroscope-based approaches can address this issue, they struggle with depth variations and translational shaking. In this paper, we propose a coarse-to-fine, unsupervised deep learning video stabilization solution that integrates image and gyroscope data to address these challenges. Our approach excels in stabilizing videos under diverse conditions, managing depth changes, and handling both translational and rotational motion. We utilize gyroscope data to estimate the 3D camera rotation and apply LSTM to predict stable poses. Grid-based motion parameters address depth-related motion, generating a multi-grid warping field that mitigates the significant image jitter caused by camera rotation. Subsequently, we achieve the elimination of residual motion at the pixel level. PDCNet is used to generated confidence maps filter optical flow to minimize disturbances from prominent local areas, while a U-Net architecture smooths the optical flow, performing pixel-level warping to generating finely stabilized frames. Comparative analysis shows that our approach surpasses state-of-the-art methods, particularly in handling complex scenes and achieving stability in challenging conditions.

Abstract Image

查看原文本刊更多论文

基于陀螺仪图像融合的无监督视频稳像算法

视频稳定旨在通过减少由摄像机抖动引起的抖动和重影来提高视觉质量，但有效地稳定低质量视频和复杂场景仍然是一个重大挑战。虽然基于陀螺仪的方法可以解决这个问题，但它们难以应对深度变化和平移震动。在本文中，我们提出了一种从粗到精、无监督的深度学习视频稳定解决方案，该解决方案集成了图像和陀螺仪数据来解决这些挑战。我们的方法在不同条件下稳定视频，管理深度变化以及处理平移和旋转运动方面表现出色。我们利用陀螺仪数据来估计3D相机的旋转，并应用LSTM来预测稳定的姿势。基于网格的运动参数处理与深度相关的运动，生成多网格扭曲场，减轻由相机旋转引起的显著图像抖动。随后，我们在像素级实现了残余运动的消除。PDCNet用于生成置信度图，过滤光流以最大限度地减少来自突出局部区域的干扰，而U-Net架构平滑光流，执行像素级扭曲以生成精细稳定的帧。对比分析表明，我们的方法超越了最先进的方法，特别是在处理复杂场景和在具有挑战性的条件下保持稳定性方面。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computers & Graphics-Uk 工程技术-计算机：软件工程

CiteScore

5.30

自引率

12.00%

发文量

173

审稿时长

38 days

期刊介绍： Computers & Graphics is dedicated to disseminate information on research and applications of computer graphics (CG) techniques. The journal encourages articles on: 1. Research and applications of interactive computer graphics. We are particularly interested in novel interaction techniques and applications of CG to problem domains. 2. State-of-the-art papers on late-breaking, cutting-edge research on CG. 3. Information on innovative uses of graphics principles and technologies. 4. Tutorial papers on both teaching CG principles and innovative uses of CG in education.