Depth-Induced Intra-to-Inter Transformer network for stereoscopic image retargeting

IF 5.1 2区 工程技术 Q1 ENGINEERING, MULTIDISCIPLINARY
Xiaoting Fan , Long Sun , Zhong Zhang
{"title":"Depth-Induced Intra-to-Inter Transformer network for stereoscopic image retargeting","authors":"Xiaoting Fan ,&nbsp;Long Sun ,&nbsp;Zhong Zhang","doi":"10.1016/j.jestch.2025.102029","DOIUrl":null,"url":null,"abstract":"<div><div>With the advancement of three-dimension visual applications, stereoscopic image editing technologies have attracted widespread popularity in both industry and entertainment. In this paper, we focus on the fundamental stereoscopic image content editing problem, <em>i.e.</em> stereoscopic image retargeting, which aims to transform stereoscopic images to specific resolution with prescribed aspect ratios adaptively. Due to the additional binocular information present between the left and right views in stereoscopic images, the CNN-based stereoscopic image retargeting methods have some obvious limitations in capturing long-range dependencies. To address these issues, we present a depth-induced intra-to-inter Transformer network (DITrans-Net) for stereoscopic image retargeting, which learns the long-range dependencies information between intra-view and inter-view by an intra-to-inter feature extraction module and aggregates the depth information of left view and right view by a depth-induced feature integration module. Specifically, an intra-to-inter feature extraction module exploits intra-to-inter Transformer blocks for long-range dependencies information extraction firstly. Furthermore, a depth-induced feature integration module employs disparity attention learning mechanism to learn stereo correspondence and enhance disparity varying consistency. Finally, a hybrid loss function is applied to improve the stereoscopic image retargeting quality. Extensive experiments demonstrate that the proposed DITrans-Net achieves significant improvements and outperforms state-of-the-art methods both quantitatively and qualitatively on the various benchmark datasets.</div></div>","PeriodicalId":48609,"journal":{"name":"Engineering Science and Technology-An International Journal-Jestech","volume":"64 ","pages":"Article 102029"},"PeriodicalIF":5.1000,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Science and Technology-An International Journal-Jestech","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2215098625000849","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

With the advancement of three-dimension visual applications, stereoscopic image editing technologies have attracted widespread popularity in both industry and entertainment. In this paper, we focus on the fundamental stereoscopic image content editing problem, i.e. stereoscopic image retargeting, which aims to transform stereoscopic images to specific resolution with prescribed aspect ratios adaptively. Due to the additional binocular information present between the left and right views in stereoscopic images, the CNN-based stereoscopic image retargeting methods have some obvious limitations in capturing long-range dependencies. To address these issues, we present a depth-induced intra-to-inter Transformer network (DITrans-Net) for stereoscopic image retargeting, which learns the long-range dependencies information between intra-view and inter-view by an intra-to-inter feature extraction module and aggregates the depth information of left view and right view by a depth-induced feature integration module. Specifically, an intra-to-inter feature extraction module exploits intra-to-inter Transformer blocks for long-range dependencies information extraction firstly. Furthermore, a depth-induced feature integration module employs disparity attention learning mechanism to learn stereo correspondence and enhance disparity varying consistency. Finally, a hybrid loss function is applied to improve the stereoscopic image retargeting quality. Extensive experiments demonstrate that the proposed DITrans-Net achieves significant improvements and outperforms state-of-the-art methods both quantitatively and qualitatively on the various benchmark datasets.
用于立体图像重定位的深度感应变压器内部到内部网络
随着三维视觉应用的发展,立体图像编辑技术在工业和娱乐领域都得到了广泛的应用。本文主要研究立体图像内容编辑的基本问题,即立体图像重定向问题,其目的是将立体图像自适应地转换为具有规定宽高比的特定分辨率。由于立体图像的左右视图之间存在额外的双目信息,基于cnn的立体图像重定向方法在捕获远程依赖关系方面存在一些明显的局限性。为了解决这些问题,我们提出了一种用于立体图像重定向的深度诱导内到间变压器网络(dittrans - net),该网络通过内到间特征提取模块学习视图内和视图间的远程依赖信息,并通过深度诱导特征集成模块聚合左视图和右视图的深度信息。具体来说,内部到内部的特征提取模块首先利用内部到内部的Transformer块进行远程依赖信息提取。深度诱导特征集成模块采用视差注意学习机制学习立体对应,增强视差变化一致性。最后,利用混合损失函数提高了立体图像的重定向质量。大量的实验表明,所提出的ditranss - net在各种基准数据集上取得了显著的改进,并且在定量和定性上都优于最先进的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Engineering Science and Technology-An International Journal-Jestech
Engineering Science and Technology-An International Journal-Jestech Materials Science-Electronic, Optical and Magnetic Materials
CiteScore
11.20
自引率
3.50%
发文量
153
审稿时长
22 days
期刊介绍: Engineering Science and Technology, an International Journal (JESTECH) (formerly Technology), a peer-reviewed quarterly engineering journal, publishes both theoretical and experimental high quality papers of permanent interest, not previously published in journals, in the field of engineering and applied science which aims to promote the theory and practice of technology and engineering. In addition to peer-reviewed original research papers, the Editorial Board welcomes original research reports, state-of-the-art reviews and communications in the broadly defined field of engineering science and technology. The scope of JESTECH includes a wide spectrum of subjects including: -Electrical/Electronics and Computer Engineering (Biomedical Engineering and Instrumentation; Coding, Cryptography, and Information Protection; Communications, Networks, Mobile Computing and Distributed Systems; Compilers and Operating Systems; Computer Architecture, Parallel Processing, and Dependability; Computer Vision and Robotics; Control Theory; Electromagnetic Waves, Microwave Techniques and Antennas; Embedded Systems; Integrated Circuits, VLSI Design, Testing, and CAD; Microelectromechanical Systems; Microelectronics, and Electronic Devices and Circuits; Power, Energy and Energy Conversion Systems; Signal, Image, and Speech Processing) -Mechanical and Civil Engineering (Automotive Technologies; Biomechanics; Construction Materials; Design and Manufacturing; Dynamics and Control; Energy Generation, Utilization, Conversion, and Storage; Fluid Mechanics and Hydraulics; Heat and Mass Transfer; Micro-Nano Sciences; Renewable and Sustainable Energy Technologies; Robotics and Mechatronics; Solid Mechanics and Structure; Thermal Sciences) -Metallurgical and Materials Engineering (Advanced Materials Science; Biomaterials; Ceramic and Inorgnanic Materials; Electronic-Magnetic Materials; Energy and Environment; Materials Characterizastion; Metallurgy; Polymers and Nanocomposites)
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信