Multimodal Fusion Image Stabilization Algorithm for Bio-Inspired Flapping-Wing Aircraft.

IF 3.9 3区医学 Q1 ENGINEERING, MULTIDISCIPLINARY

Biomimetics Pub Date : 2025-07-07 DOI:10.3390/biomimetics10070448

Zhikai Wang, Sen Wang, Yiwen Hu, Yangfan Zhou, Na Li, Xiaofeng Zhang

{"title":"Multimodal Fusion Image Stabilization Algorithm for Bio-Inspired Flapping-Wing Aircraft.","authors":"Zhikai Wang, Sen Wang, Yiwen Hu, Yangfan Zhou, Na Li, Xiaofeng Zhang","doi":"10.3390/biomimetics10070448","DOIUrl":null,"url":null,"abstract":"<p><p>This paper presents FWStab, a specialized video stabilization dataset tailored for flapping-wing platforms. The dataset encompasses five typical flight scenarios, featuring 48 video clips with intense dynamic jitter. The corresponding Inertial Measurement Unit (IMU) sensor data are synchronously collected, which jointly provide reliable support for multimodal modeling. Based on this, to address the issue of poor image acquisition quality due to severe vibrations in aerial vehicles, this paper proposes a multi-modal signal fusion video stabilization framework. This framework effectively integrates image features and inertial sensor features to predict smooth and stable camera poses. During the video stabilization process, the true camera motion originally estimated based on sensors is warped to the smooth trajectory predicted by the network, thereby optimizing the inter-frame stability. This approach maintains the global rigidity of scene motion, avoids visual artifacts caused by traditional dense optical flow-based spatiotemporal warping, and rectifies rolling shutter-induced distortions. Furthermore, the network is trained in an unsupervised manner by leveraging a joint loss function that integrates camera pose smoothness and optical flow residuals. When coupled with a multi-stage training strategy, this framework demonstrates remarkable stabilization adaptability across a wide range of scenarios. The entire framework employs Long Short-Term Memory (LSTM) to model the temporal characteristics of camera trajectories, enabling high-precision prediction of smooth trajectories.</p>","PeriodicalId":8907,"journal":{"name":"Biomimetics","volume":"10 7","pages":""},"PeriodicalIF":3.9000,"publicationDate":"2025-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12292680/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biomimetics","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.3390/biomimetics10070448","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

This paper presents FWStab, a specialized video stabilization dataset tailored for flapping-wing platforms. The dataset encompasses five typical flight scenarios, featuring 48 video clips with intense dynamic jitter. The corresponding Inertial Measurement Unit (IMU) sensor data are synchronously collected, which jointly provide reliable support for multimodal modeling. Based on this, to address the issue of poor image acquisition quality due to severe vibrations in aerial vehicles, this paper proposes a multi-modal signal fusion video stabilization framework. This framework effectively integrates image features and inertial sensor features to predict smooth and stable camera poses. During the video stabilization process, the true camera motion originally estimated based on sensors is warped to the smooth trajectory predicted by the network, thereby optimizing the inter-frame stability. This approach maintains the global rigidity of scene motion, avoids visual artifacts caused by traditional dense optical flow-based spatiotemporal warping, and rectifies rolling shutter-induced distortions. Furthermore, the network is trained in an unsupervised manner by leveraging a joint loss function that integrates camera pose smoothness and optical flow residuals. When coupled with a multi-stage training strategy, this framework demonstrates remarkable stabilization adaptability across a wide range of scenarios. The entire framework employs Long Short-Term Memory (LSTM) to model the temporal characteristics of camera trajectories, enabling high-precision prediction of smooth trajectories.

查看原文本刊更多论文

仿生扑翼飞机多模态融合稳像算法。

本文介绍了FWStab，一个专门为扑翼平台定制的视频稳定数据集。该数据集包括五种典型的飞行场景，包括48个带有强烈动态抖动的视频片段。同步采集相应的惯性测量单元（IMU）传感器数据，共同为多模态建模提供可靠支持。基于此，针对飞行器剧烈振动导致图像采集质量差的问题，本文提出了一种多模态信号融合视频稳定框架。该框架有效地集成了图像特征和惯性传感器特征，以预测平滑稳定的相机姿态。在稳像过程中，将原本基于传感器估计的真实摄像机运动扭曲为网络预测的平滑轨迹，从而优化帧间稳定性。该方法保持了场景运动的全局刚性，避免了传统的基于密集光流的时空扭曲造成的视觉伪影，并纠正了卷帘门引起的畸变。此外，通过利用集成相机姿态平滑和光流残差的联合损失函数，以无监督的方式训练网络。当与多阶段训练策略相结合时，该框架在广泛的场景中表现出显著的稳定适应性。整个框架采用长短期记忆（LSTM）来模拟相机轨迹的时间特征，从而实现对平滑轨迹的高精度预测。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊