Virtual Reality Intelligent Hardware最新文献

Designing social immersive virtual environments for the Metaverse: The case study of MetaLibrary 为虚拟世界设计社交沉浸式虚拟环境：以metabrary为例

Virtual Reality Intelligent Hardware Pub Date : 2025-06-01 DOI: 10.1016/j.vrih.2025.04.002

Alberto CANNAVÒ, Giorgio ARRIGO, Alessandro VISCONTI, Federico De LORENZIS, Fabrizio LAMBERTI

{"title":"Designing social immersive virtual environments for the Metaverse: The case study of MetaLibrary","authors":"Alberto CANNAVÒ, Giorgio ARRIGO, Alessandro VISCONTI, Federico De LORENZIS, Fabrizio LAMBERTI","doi":"10.1016/j.vrih.2025.04.002","DOIUrl":"10.1016/j.vrih.2025.04.002","url":null,"abstract":"<div><h3>Background</h3><div>Over the last few years, the rapid advancement of technology has led to the development of many approaches to digitalization. In this respect, metaverse provides 3D persistent virtual environments that can be used to access digital content, meet virtually, and perform several professional and leisure tasks. Among the numerous technologies supporting the metaverse, immersive Virtual Reality (VR) plays a primary role and offers highly interactive social experiences. Despite growing interest in this area, there are no clear design guidelines for creating environments tailored to the metaverse.</div></div><div><h3>Methods</h3><div>This study seeks to advance research in this area by moving from state-of-the-art studies on the design of immersive virtual environments in the context of metaverse and proposing how to integrate cutting-edge technologies within this context. Specifically, the best practices were identified by i) analyzing literature studies focused on human behavior in immersive virtual environments, ii) extracting common features of existing social VR platforms, and iii) conducting interviews with experts in a specific application domain. Specifically, this study considered the creation of a new virtual environment for MetaLibrary, a VR-based social platform aimed at integrating public libraries into metaverse. Several implementation challenges and additional requirements have been identified for the development of virtual environments (VEs). These elements were considered in the selection of specific cutting-edge technologies and their integration into the development process. A user study was also conducted to investigate some design aspects (namely lighting conditions and richness of the scene layout) for which deriving clear indications from the above analysis was not possible because different alternative configurations could be chosen.</div></div><div><h3>Results</h3><div>The work reported in this paper seeks to bridge the gap between existing VR platforms and related literature in the field, on the one hand, and requirements regarding immersive virtual environments for the metaverse, on the other hand, by reporting a set of best practices which were used to build a social virtual environment that meets users' expectations and needs.</div></div><div><h3>Conclusions</h3><div>Results suggest that carefully designed virtual environments can positively affect user experience and interaction within metaverse. The insights gained from this study offer valuable cues for developing immersive virtual environments for the metaverse to deliver more effective and engaging experiences.</div></div>","PeriodicalId":33538,"journal":{"name":"Virtual Reality Intelligent Hardware","volume":"7 3","pages":"Pages 279-298"},"PeriodicalIF":0.0,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144491438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Optimal load frequency control system for two-area connected via AC/DC link using cuckoo search algorithm 基于布谷鸟搜索算法的交直流两区连接最优负荷频率控制系统

Virtual Reality Intelligent Hardware Pub Date : 2025-06-01 DOI: 10.1016/j.vrih.2025.03.006

Gaber EL-SAADY , Alexey MIKHAYLOV , Nora BARANYAI , Mahrous AHMED , Mahmoud HEMEIDA

{"title":"Optimal load frequency control system for two-area connected via AC/DC link using cuckoo search algorithm","authors":"Gaber EL-SAADY , Alexey MIKHAYLOV , Nora BARANYAI , Mahrous AHMED , Mahmoud HEMEIDA","doi":"10.1016/j.vrih.2025.03.006","DOIUrl":"10.1016/j.vrih.2025.03.006","url":null,"abstract":"<div><h3>Background</h3><div>Interconnection of different power systems has a major effect on system stability. This study aims to design an optimal load frequency control (LFC) system based on a proportional-integral (PI) controller for a two-area power system.</div></div><div><h3>Methods</h3><div>Two areas were connected through an AC tie line in parallel with a DC link to stabilize the frequency of oscillations in both areas. The PI parameters were tuned using the cuckoo search algorithm (CSA) to minimize the integral absolute error (IAE). A state matrix was provided, and the stability of the system was verified by calculating the eigenvalues. The frequency response was investigated for load variation, changes in the generator rate constraint, the turbine time constant, and the governor time constant.</div></div><div><h3>Results</h3><div>The CSA was compared with particle swarm optimization algorithm (PSO) under identical conditions. The system was modeled based on a state-space mathematical representation and simulated using MATLAB. The results demonstrated the effectiveness of the proposed controller based on both algorithms and, it is clear that CSA is superior to PSO.</div></div><div><h3>Conclusion</h3><div>The CSA algorithm smoothens the system response, reduces ripples, decreases overshooting and settling time, and improves the overall system performance under different disturbances.</div></div>","PeriodicalId":33538,"journal":{"name":"Virtual Reality Intelligent Hardware","volume":"7 3","pages":"Pages 299-316"},"PeriodicalIF":0.0,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144491946","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Effects of immersive virtual reality-based exercise on quality of life, stress, anxiety, depression, and handgrip strength in fibromyalgia: A pilot study 沉浸式虚拟现实运动对纤维肌痛患者生活质量、压力、焦虑、抑郁和握力的影响：一项初步研究

Virtual Reality Intelligent Hardware Pub Date : 2025-06-01 DOI: 10.1016/j.vrih.2025.03.004

Gonzalo ARIAS-ÁLVAREZ , Carla GUZMÁN-PINCHEIRA , Diego GONZÁLEZ-GONZÁLEZ , Waldo OSORIO-TORRES , Daniel PECOS-MARTÍN , José GÓMEZ-PULIDO , Claudio CARVAJAL-PARODI

{"title":"Effects of immersive virtual reality-based exercise on quality of life, stress, anxiety, depression, and handgrip strength in fibromyalgia: A pilot study","authors":"Gonzalo ARIAS-ÁLVAREZ , Carla GUZMÁN-PINCHEIRA , Diego GONZÁLEZ-GONZÁLEZ , Waldo OSORIO-TORRES , Daniel PECOS-MARTÍN , José GÓMEZ-PULIDO , Claudio CARVAJAL-PARODI","doi":"10.1016/j.vrih.2025.03.004","DOIUrl":"10.1016/j.vrih.2025.03.004","url":null,"abstract":"<div><h3>Background</h3><div>Fibromyalgia (FM) is a chronic rheumatic disorder characterised by musculoskeletal pain, fatigue, and psychoemotional symptoms. Virtual reality (VR) has proven to be an innovative and motivating tool for managing FM, with several studies indicating that it can improve quality of life indices and reduce psychoemotional symptoms. However, studies on immersive VR-based exercise (iVRE) are limited.</div></div><div><h3>Methods</h3><div>The aim of this study was to evaluate the effects of iVRE on quality of life, stress, anxiety, depression, and handgrip strength in patients with FM. A single-arm pre-post-test pilot study was conducted. Individuals diagnosed with FM were recruited using convenience sampling. The iVRE protocol consisted of 12 sessions of 10 min warm-up and 15 min exercises applied with the Oculus Quest 2<sup>TM</sup> device. The impact on quality of life was assessed using the Revised Fibromyalgia Impact Questionnaire, and the effects on stress, anxiety, and depression were determined using the Depression Anxiety Stress Scale-21 questionnaire. Handgrip strength was evaluated using the Baseline® dynamometer. The normality assumption was evaluated, and the pre-post means were compared using Student's <em>t</em>-test (<em>p</em> < 0.05).</div></div><div><h3>Results</h3><div>Eleven individuals (40.6 ± 11.2 years) completed the protocol (10 women). There were significant differences in favour of iVRE in quality of life impact (<em>p</em> < 0.001, Cohen's <em>d</em>: 1.48), handgrip strength (<em>p</em> < 0.05, Cohen's <em>d</em>: 0.26), depression (<em>p</em> < 0.05, Cohen's <em>d</em>: 0.73), and anxiety (<em>p</em> < 0.05, Cohen's <em>d</em>: 0.73).</div></div><div><h3>Conclusions</h3><div>A six-week iVRE program significantly reduces the impact on quality of life, anxiety, and depression and improves handgrip strength in people with FM. Future studies should investigate the physiological effects using systemic biomarkers to explain the scope of this therapeutic modality.</div></div>","PeriodicalId":33538,"journal":{"name":"Virtual Reality Intelligent Hardware","volume":"7 3","pages":"Pages 267-278"},"PeriodicalIF":0.0,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144491437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A new human-computer interaction paradigm: Agent interaction model based on large models and its prospects 一种新的人机交互范式：基于大模型的Agent交互模型及其展望

Virtual Reality Intelligent Hardware Pub Date : 2025-06-01 DOI: 10.1016/j.vrih.2025.04.001

Yang LIU

{"title":"A new human-computer interaction paradigm: Agent interaction model based on large models and its prospects","authors":"Yang LIU","doi":"10.1016/j.vrih.2025.04.001","DOIUrl":"10.1016/j.vrih.2025.04.001","url":null,"abstract":"<div><div>This study examines the advent of agent interaction (AIx) as a transformative paradigm in human-computer interaction (HCI), signifying a notable evolution beyond traditional graphical interfaces and touchscreen interactions. Within the context of large models, AIx is characterized by its innovative interaction patterns and a plethora of application scenarios that hold great potential. The paper highlights the pivotal role of AIx in shaping the future landscape of the large model industry, emphasizing its adoption and necessity from a user's perspective. This study underscores the pivotal role of AIx in dictating the future trajectory of a large model industry by emphasizing the importance of its adoption and necessity from a user-centric perspective. The fundamental drivers of AIx include the introduction of novel capabilities, replication of capabilities (both anthropomorphic and superhuman), migration of capabilities, aggregation of intelligence, and multiplication of capabilities. These elements are essential for propelling innovation, expanding the frontiers of capability, and realizing the exponential superposition of capabilities, thereby mitigating labor redundancy and addressing a spectrum of human needs. Furthermore, this study provides an in-depth analysis of the structural components and operational mechanisms of agents supported by large models. Such advancements significantly enhance the capacity of agents to tackle complex problems and provide intelligent services, thereby facilitating a more intuitive, adaptive, and personalized engagement between humans and machines. The study further delineates four principal categories of interaction patterns that encompass eight distinct modalities of interaction, corresponding to twenty-one specific scenarios, including applications in smart home systems, health assistance, and elderly care. This emphasizes the significance of this new paradigm in advancing HCI, fostering technological advancements, and redefining user experiences. However, it also acknowledges the challenges and ethical considerations that accompany this paradigm shift, recognizing the need for a balanced approach to harness the full potential of AIx in modern society.</div></div>","PeriodicalId":33538,"journal":{"name":"Virtual Reality Intelligent Hardware","volume":"7 3","pages":"Pages 237-266"},"PeriodicalIF":0.0,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144491436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Advanced driver assistance system (ADAS) and machine learning (ML): The dynamic duo revolutionizing the automotive industry 高级驾驶辅助系统（ADAS）和机器学习（ML）：这两个动态组合将彻底改变汽车行业

Virtual Reality Intelligent Hardware Pub Date : 2025-06-01 DOI: 10.1016/j.vrih.2025.01.002

Harsh SHAH , Karan SHAH , Kushagra DARJI , Adit SHAH , Manan SHAH

{"title":"Advanced driver assistance system (ADAS) and machine learning (ML): The dynamic duo revolutionizing the automotive industry","authors":"Harsh SHAH , Karan SHAH , Kushagra DARJI , Adit SHAH , Manan SHAH","doi":"10.1016/j.vrih.2025.01.002","DOIUrl":"10.1016/j.vrih.2025.01.002","url":null,"abstract":"<div><div>The advanced driver assistance system (ADAS) primarily serves to assist drivers in monitoring the speed of the car and helps them make the right decision, which leads to fewer fatal accidents and ensures higher safety. In the artificial Intelligence domain, machine learning (ML) was developed to make inferences with a degree of accuracy similar to that of humans; however, enormous amounts of data are required. Machine learning enhances the accuracy of the decisions taken by ADAS, by evaluating all the data received from various vehicle sensors. This study summarizes all the critical algorithms used in ADAS technologies and presents the evolution of ADAS technology. Initially, ADAS technology is introduced, along with its evolution, to understand the objectives of developing this technology. Subsequently, the critical algorithms used in ADAS technology, which include face detection, head-pose estimation, gaze estimation, and link detection are discussed. A further discussion follows on the impact of ML on each algorithm in different environments, leading to increased accuracy at the expense of additional computing, to increase efficiency. The aim of this study was to evaluate all the methods with or without ML for each algorithm.</div></div>","PeriodicalId":33538,"journal":{"name":"Virtual Reality Intelligent Hardware","volume":"7 3","pages":"Pages 203-236"},"PeriodicalIF":0.0,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144491471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multidimensional image morphing-fast image-based rendering of open 3D and VR environments 多维图像变形-基于开放3D和VR环境的快速图像渲染

Virtual Reality Intelligent Hardware Pub Date : 2025-04-01 DOI: 10.1016/j.vrih.2023.06.007

Simon Seibt , Bastian Kuth , Bartosz von Rymon Lipinski , Thomas Chang , Marc Erich Latoschik

{"title":"Multidimensional image morphing-fast image-based rendering of open 3D and VR environments","authors":"Simon Seibt , Bastian Kuth , Bartosz von Rymon Lipinski , Thomas Chang , Marc Erich Latoschik","doi":"10.1016/j.vrih.2023.06.007","DOIUrl":"10.1016/j.vrih.2023.06.007","url":null,"abstract":"<div><h3>Background</h3><div>In recent years, the demand for interactive photorealistic three-dimensional (3D) environments has increased in various fields, including architecture, engineering, and entertainment. However, achieving a balance between the quality and efficiency of high-performance 3D applications and virtual reality (VR) remains challenging.</div></div><div><h3>Methods</h3><div>This study addresses this issue by revisiting and extending view interpolation for image-based rendering (IBR), which enables the exploration of spacious open environments in 3D and VR. Therefore, we introduce multimorphing, a novel rendering method based on the spatial data structure of 2D image patches, called the image graph. Using this approach, novel views can be rendered with up to six degrees of freedom using only a sparse set of views. The rendering process does not require 3D reconstruction of the geometry or per-pixel depth information, and all relevant data for the output are extracted from the local morphing cells of the image graph. The detection of parallax image regions during preprocessing reduces rendering artifacts by extrapolating image patches from adjacent cells in real-time. In addition, a GPU-based solution was presented to resolve exposure inconsistencies within a dataset, enabling seamless transitions of brightness when moving between areas with varying light intensities.</div></div><div><h3>Results</h3><div>Experiments on multiple real-world and synthetic scenes demonstrate that the presented method achieves high \"VR-compatible\" frame rates, even on mid-range and legacy hardware, respectively. While achieving adequate visual quality even for sparse datasets, it outperforms other IBR and current neural rendering approaches.</div></div><div><h3>Conclusions</h3><div>Using the correspondence-based decomposition of input images into morphing cells of 2D image patches, multidimensional image morphing provides high-performance novel view generation, supporting open 3D and VR environments. Nevertheless, the handling of morphing artifacts in the parallax image regions remains a topic for future research.</div></div>","PeriodicalId":33538,"journal":{"name":"Virtual Reality Intelligent Hardware","volume":"7 2","pages":"Pages 155-172"},"PeriodicalIF":0.0,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143864788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

STDNet: Improved lip reading via short-term temporal dependency modeling STDNet：通过短期时间依赖建模改进唇读

Virtual Reality Intelligent Hardware Pub Date : 2025-04-01 DOI: 10.1016/j.vrih.2024.07.003

Xiaoer Wu , Zhenhua Tan , Ziwei Cheng , Yuran Ru

{"title":"STDNet: Improved lip reading via short-term temporal dependency modeling","authors":"Xiaoer Wu , Zhenhua Tan , Ziwei Cheng , Yuran Ru","doi":"10.1016/j.vrih.2024.07.003","DOIUrl":"10.1016/j.vrih.2024.07.003","url":null,"abstract":"<div><h3>Background</h3><div>Lip reading uses lip images for visual speech recognition. Deep-learning-based lip reading has greatly improved performance in current datasets; however, most existing research ignores the significance of short-term temporal dependencies of lip-shape variations between adjacent frames, which leaves space for further improvement in feature extraction.</div></div><div><h3>Methods</h3><div>This article presents a spatiotemporal feature fusion network (STDNet) that compensates for the deficiencies of current lip-reading approaches in short-term temporal dependency modeling. Specifically, to distinguish more similar and intricate content, STDNet adds a temporal feature extraction branch based on a 3D-CNN, which enhances the learning of dynamic lip movements in adjacent frames while not affecting spatial feature extraction. In particular, we designed a local–temporal block, which aggregates interframe differences, strengthening the relationship between various local lip regions through multiscale convolution. We incorporated the squeeze-and-excitation mechanism into the Global-Temporal Block, which processes a single frame as an independent unitto learn temporal variations across the entire lip region more effectively. Furthermore, attention pooling was introduced to highlight meaningful frames containing key semantic information for the target word.</div></div><div><h3>Results</h3><div>Experimental results demonstrated STDNet's superior performance on the LRW and LRW-1000, achieving word-level recognition accuracies of 90.2% and 53.56%, respectively. Extensive ablation experiments verified the rationality and effectiveness of its modules.</div></div><div><h3>Conclusions</h3><div>The proposed model effectively addresses short-term temporal dependency limitations in lip reading, and improves the temporal robustness of the model against variable-length sequences. These advancements validate the importance of explicit short-term dynamics modeling for practical lip-reading systems.</div></div>","PeriodicalId":33538,"journal":{"name":"Virtual Reality Intelligent Hardware","volume":"7 2","pages":"Pages 173-187"},"PeriodicalIF":0.0,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143864853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Segmentation of CAD models using hybrid representation 基于混合表示的CAD模型分割

Virtual Reality Intelligent Hardware Pub Date : 2025-04-01 DOI: 10.1016/j.vrih.2025.01.001

Claude Uwimana , Shengdi Zhou , Limei Yang , Zhuqing Li , Norbelt Mutagisha , Edouard Niyongabo , Bin Zhou

{"title":"Segmentation of CAD models using hybrid representation","authors":"Claude Uwimana , Shengdi Zhou , Limei Yang , Zhuqing Li , Norbelt Mutagisha , Edouard Niyongabo , Bin Zhou","doi":"10.1016/j.vrih.2025.01.001","DOIUrl":"10.1016/j.vrih.2025.01.001","url":null,"abstract":"<div><div>In this paper, we introduce an innovative method for computer-aided design (CAD) segmentation by concatenating meshes and CAD models. Many previous CAD segmentation methods have achieved impressive performance using single representations, such as meshes, CAD, and point clouds. However, existing methods cannot effectively combine different three-dimensional model types for the direct conversion, alignment, and integrity maintenance of geometric and topological information. Hence, we propose an integration approach that combines the geometric accuracy of CAD data with the flexibility of mesh representations, as well as introduce a unique hybrid representation that combines CAD and mesh models to enhance segmentation accuracy. To combine these two model types, our hybrid system utilizes advanced-neural-network techniques to convert CAD models into mesh models. For complex CAD models, model segmentation is crucial for model retrieval and reuse. In partial retrieval, it aims to segment a complex CAD model into several simple components. The first component of our hybrid system involves advanced mesh-labeling algorithms that harness the digitization of CAD properties to mesh models. The second component integrates labelled face features for CAD segmentation by leveraging the abundant multisemantic information embedded in CAD models. This combination of mesh and CAD not only refines the accuracy of boundary delineation but also provides a comprehensive understanding of the underlying object semantics. This study uses the Fusion 360 Gallery dataset. Experimental results indicate that our hybrid method can segment these models with higher accuracy than other methods that use single representations.</div></div>","PeriodicalId":33538,"journal":{"name":"Virtual Reality Intelligent Hardware","volume":"7 2","pages":"Pages 188-202"},"PeriodicalIF":0.0,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143864854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Efficient and lightweight 3D building reconstruction from drone imagery using sparse line and point clouds 使用稀疏的线和点云从无人机图像中高效和轻量级的3D建筑重建

Virtual Reality Intelligent Hardware Pub Date : 2025-04-01 DOI: 10.1016/j.vrih.2025.02.001

Xiongjie Yin , Jinquan He , Zhanglin Cheng

引用次数: 0

Deconfounded fashion image captioning with transformer and multimodal retrieval 用变压器和多模态检索解构时尚图像字幕

Virtual Reality Intelligent Hardware Pub Date : 2025-04-01 DOI: 10.1016/j.vrih.2024.08.002

Tao Peng, Weiqiao Yin, Junping Liu, Li Li, Xinrong Hu

{"title":"Deconfounded fashion image captioning with transformer and multimodal retrieval","authors":"Tao Peng, Weiqiao Yin, Junping Liu, Li Li, Xinrong Hu","doi":"10.1016/j.vrih.2024.08.002","DOIUrl":"10.1016/j.vrih.2024.08.002","url":null,"abstract":"<div><h3>Background</h3><div>The annotation of fashion images is a significantly important task in the fashion industry as well as social media and e-commerce. However, owing to the complexity and diversity of fashion images, this task entails multiple challenges, including the lack of fine-grained captions and confounders caused by dataset bias. Specifically, confounders often cause models to learn spurious correlations, thereby reducing their generalization capabilities.</div></div><div><h3>Method</h3><div>In this work, we propose the Deconfounded Fashion Image Captioning (DFIC) framework, which first uses multimodal retrieval to enrich the predicted captions of clothing, and then constructs a detailed causal graph using causal inference in the decoder to perform deconfounding. Multimodal retrieval is used to obtain semantic words related to image features, which are input into the decoder as prompt words to enrich sentence descriptions. In the decoder, causal inference is applied to disentangle visual and semantic features while concurrently eliminating visual and language confounding.</div></div><div><h3>Results</h3><div>Overall, our method can not only effectively enrich the captions of target images, but also greatly reduce confounders caused by the dataset. To verify the effectiveness of the proposed framework, the model was experimentally verified using the FACAD dataset.</div></div>","PeriodicalId":33538,"journal":{"name":"Virtual Reality Intelligent Hardware","volume":"7 2","pages":"Pages 127-138"},"PeriodicalIF":0.0,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143864168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0