Simon Seibt , Bastian Kuth , Bartosz von Rymon Lipinski , Thomas Chang , Marc Erich Latoschik
{"title":"Multidimensional image morphing-fast image-based rendering of open 3D and VR environments","authors":"Simon Seibt , Bastian Kuth , Bartosz von Rymon Lipinski , Thomas Chang , Marc Erich Latoschik","doi":"10.1016/j.vrih.2023.06.007","DOIUrl":"10.1016/j.vrih.2023.06.007","url":null,"abstract":"<div><h3>Background</h3><div>In recent years, the demand for interactive photorealistic three-dimensional (3D) environments has increased in various fields, including architecture, engineering, and entertainment. However, achieving a balance between the quality and efficiency of high-performance 3D applications and virtual reality (VR) remains challenging.</div></div><div><h3>Methods</h3><div>This study addresses this issue by revisiting and extending view interpolation for image-based rendering (IBR), which enables the exploration of spacious open environments in 3D and VR. Therefore, we introduce multimorphing, a novel rendering method based on the spatial data structure of 2D image patches, called the image graph. Using this approach, novel views can be rendered with up to six degrees of freedom using only a sparse set of views. The rendering process does not require 3D reconstruction of the geometry or per-pixel depth information, and all relevant data for the output are extracted from the local morphing cells of the image graph. The detection of parallax image regions during preprocessing reduces rendering artifacts by extrapolating image patches from adjacent cells in real-time. In addition, a GPU-based solution was presented to resolve exposure inconsistencies within a dataset, enabling seamless transitions of brightness when moving between areas with varying light intensities.</div></div><div><h3>Results</h3><div>Experiments on multiple real-world and synthetic scenes demonstrate that the presented method achieves high \"VR-compatible\" frame rates, even on mid-range and legacy hardware, respectively. While achieving adequate visual quality even for sparse datasets, it outperforms other IBR and current neural rendering approaches.</div></div><div><h3>Conclusions</h3><div>Using the correspondence-based decomposition of input images into morphing cells of 2D image patches, multidimensional image morphing provides high-performance novel view generation, supporting open 3D and VR environments. Nevertheless, the handling of morphing artifacts in the parallax image regions remains a topic for future research.</div></div>","PeriodicalId":33538,"journal":{"name":"Virtual Reality Intelligent Hardware","volume":"7 2","pages":"Pages 155-172"},"PeriodicalIF":0.0,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143864788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"STDNet: Improved lip reading via short-term temporal dependency modeling","authors":"Xiaoer Wu , Zhenhua Tan , Ziwei Cheng , Yuran Ru","doi":"10.1016/j.vrih.2024.07.003","DOIUrl":"10.1016/j.vrih.2024.07.003","url":null,"abstract":"<div><h3>Background</h3><div>Lip reading uses lip images for visual speech recognition. Deep-learning-based lip reading has greatly improved performance in current datasets; however, most existing research ignores the significance of short-term temporal dependencies of lip-shape variations between adjacent frames, which leaves space for further improvement in feature extraction.</div></div><div><h3>Methods</h3><div>This article presents a spatiotemporal feature fusion network (STDNet) that compensates for the deficiencies of current lip-reading approaches in short-term temporal dependency modeling. Specifically, to distinguish more similar and intricate content, STDNet adds a temporal feature extraction branch based on a 3D-CNN, which enhances the learning of dynamic lip movements in adjacent frames while not affecting spatial feature extraction. In particular, we designed a local–temporal block, which aggregates interframe differences, strengthening the relationship between various local lip regions through multiscale convolution. We incorporated the squeeze-and-excitation mechanism into the Global-Temporal Block, which processes a single frame as an independent unitto learn temporal variations across the entire lip region more effectively. Furthermore, attention pooling was introduced to highlight meaningful frames containing key semantic information for the target word.</div></div><div><h3>Results</h3><div>Experimental results demonstrated STDNet's superior performance on the LRW and LRW-1000, achieving word-level recognition accuracies of 90.2% and 53.56%, respectively. Extensive ablation experiments verified the rationality and effectiveness of its modules.</div></div><div><h3>Conclusions</h3><div>The proposed model effectively addresses short-term temporal dependency limitations in lip reading, and improves the temporal robustness of the model against variable-length sequences. These advancements validate the importance of explicit short-term dynamics modeling for practical lip-reading systems.</div></div>","PeriodicalId":33538,"journal":{"name":"Virtual Reality Intelligent Hardware","volume":"7 2","pages":"Pages 173-187"},"PeriodicalIF":0.0,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143864853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Claude Uwimana , Shengdi Zhou , Limei Yang , Zhuqing Li , Norbelt Mutagisha , Edouard Niyongabo , Bin Zhou
{"title":"Segmentation of CAD models using hybrid representation","authors":"Claude Uwimana , Shengdi Zhou , Limei Yang , Zhuqing Li , Norbelt Mutagisha , Edouard Niyongabo , Bin Zhou","doi":"10.1016/j.vrih.2025.01.001","DOIUrl":"10.1016/j.vrih.2025.01.001","url":null,"abstract":"<div><div>In this paper, we introduce an innovative method for computer-aided design (CAD) segmentation by concatenating meshes and CAD models. Many previous CAD segmentation methods have achieved impressive performance using single representations, such as meshes, CAD, and point clouds. However, existing methods cannot effectively combine different three-dimensional model types for the direct conversion, alignment, and integrity maintenance of geometric and topological information. Hence, we propose an integration approach that combines the geometric accuracy of CAD data with the flexibility of mesh representations, as well as introduce a unique hybrid representation that combines CAD and mesh models to enhance segmentation accuracy. To combine these two model types, our hybrid system utilizes advanced-neural-network techniques to convert CAD models into mesh models. For complex CAD models, model segmentation is crucial for model retrieval and reuse. In partial retrieval, it aims to segment a complex CAD model into several simple components. The first component of our hybrid system involves advanced mesh-labeling algorithms that harness the digitization of CAD properties to mesh models. The second component integrates labelled face features for CAD segmentation by leveraging the abundant multisemantic information embedded in CAD models. This combination of mesh and CAD not only refines the accuracy of boundary delineation but also provides a comprehensive understanding of the underlying object semantics. This study uses the Fusion 360 Gallery dataset. Experimental results indicate that our hybrid method can segment these models with higher accuracy than other methods that use single representations.</div></div>","PeriodicalId":33538,"journal":{"name":"Virtual Reality Intelligent Hardware","volume":"7 2","pages":"Pages 188-202"},"PeriodicalIF":0.0,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143864854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient and lightweight 3D building reconstruction from drone imagery using sparse line and point clouds","authors":"Xiongjie Yin , Jinquan He , Zhanglin Cheng","doi":"10.1016/j.vrih.2025.02.001","DOIUrl":"10.1016/j.vrih.2025.02.001","url":null,"abstract":"<div><div>Efficient three-dimensional (3D) building reconstruction from drone imagery often faces data acquisition, storage, and computational challenges because of its reliance on dense point clouds. In this study, we introduced a novel method for efficient and lightweight 3D building reconstruction from drone imagery using line clouds and sparse point clouds. Our approach eliminates the need to generate dense point clouds, and thus significantly reduces the computational burden by reconstructing 3D models directly from sparse data. We addressed the limitations of line clouds for plane detection and reconstruction by using a new algorithm. This algorithm projects 3D line clouds onto a 2D plane, clusters the projections to identify potential planes, and refines them using sparse point clouds to ensure an accurate and efficient model reconstruction. Extensive qualitative and quantitative experiments demonstrated the effectiveness of our method, demonstrating its superiority over existing techniques in terms of simplicity and efficiency.</div></div>","PeriodicalId":33538,"journal":{"name":"Virtual Reality Intelligent Hardware","volume":"7 2","pages":"Pages 111-126"},"PeriodicalIF":0.0,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143864167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tao Peng, Weiqiao Yin, Junping Liu, Li Li, Xinrong Hu
{"title":"Deconfounded fashion image captioning with transformer and multimodal retrieval","authors":"Tao Peng, Weiqiao Yin, Junping Liu, Li Li, Xinrong Hu","doi":"10.1016/j.vrih.2024.08.002","DOIUrl":"10.1016/j.vrih.2024.08.002","url":null,"abstract":"<div><h3>Background</h3><div>The annotation of fashion images is a significantly important task in the fashion industry as well as social media and e-commerce. However, owing to the complexity and diversity of fashion images, this task entails multiple challenges, including the lack of fine-grained captions and confounders caused by dataset bias. Specifically, confounders often cause models to learn spurious correlations, thereby reducing their generalization capabilities.</div></div><div><h3>Method</h3><div>In this work, we propose the Deconfounded Fashion Image Captioning (DFIC) framework, which first uses multimodal retrieval to enrich the predicted captions of clothing, and then constructs a detailed causal graph using causal inference in the decoder to perform deconfounding. Multimodal retrieval is used to obtain semantic words related to image features, which are input into the decoder as prompt words to enrich sentence descriptions. In the decoder, causal inference is applied to disentangle visual and semantic features while concurrently eliminating visual and language confounding.</div></div><div><h3>Results</h3><div>Overall, our method can not only effectively enrich the captions of target images, but also greatly reduce confounders caused by the dataset. To verify the effectiveness of the proposed framework, the model was experimentally verified using the FACAD dataset.</div></div>","PeriodicalId":33538,"journal":{"name":"Virtual Reality Intelligent Hardware","volume":"7 2","pages":"Pages 127-138"},"PeriodicalIF":0.0,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143864168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Amir Azizi , Panayiotis Charalambous , Yiorgos Chrysanthou
{"title":"DeepSafe:Two-level deep learning approach for disaster victims detection","authors":"Amir Azizi , Panayiotis Charalambous , Yiorgos Chrysanthou","doi":"10.1016/j.vrih.2024.08.005","DOIUrl":"10.1016/j.vrih.2024.08.005","url":null,"abstract":"<div><h3>Background</h3><div>Efficient disaster victim detection (DVD) in urban areas after natural disasters is crucial for minimizing losses. However, conventional search and rescue (SAR) methods often experience delays, which can hinder the timely detection of victims. SAR teams face various challenges, including limited access to debris and collapsed structures, safety risks due to unstable conditions, and disrupted communication networks.</div></div><div><h3>Methods</h3><div>In this paper, we present DeepSafe, a novel two-level deep learning approach for multilevel classification and object detection using a simulated disaster victim dataset. DeepSafe first employs YOLOv8 to classify images into victim and non-victim categories. Subsequently, Detectron2 is used to precisely locate and outline the victims.</div></div><div><h3>Results</h3><div>Experimental results demonstrate the promising performance of DeepSafe in both victim classification and detection. The model effectively identified and located victims under the challenging conditions presented in the dataset.</div></div><div><h3>Conclusion</h3><div>DeepSafe offers a practical tool for real-time disaster management and SAR operations, significantly improving conventional methods by reducing delays and enhancing victim detection accuracy in disaster-stricken urban areas.</div></div>","PeriodicalId":33538,"journal":{"name":"Virtual Reality Intelligent Hardware","volume":"7 2","pages":"Pages 139-154"},"PeriodicalIF":0.0,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143864786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Chasing in virtual environment:Dynamic alignment for multi-user collaborative redirected walking","authors":"Tianyang Dong, Shuqian Lv, Hubin Kong, Huanbo Zhang","doi":"10.1016/j.vrih.2024.07.002","DOIUrl":"10.1016/j.vrih.2024.07.002","url":null,"abstract":"<div><h3>Background</h3><div>The redirected walking (RDW) method for multi-user collaboration requires maintaining the relative position between users in a virtual environment (VE) and physical environment (PE). A chasing game in a VE is a typical virtual reality game that entails multi-user collaboration. When a user approaches and interacts with a target user in the VE, the user is expected to approach and interact with the target user in the corresponding PE as well. Existing methods of multi-user RDW mainly focus on obstacle avoidance, which does not account for the relative positional relationship between the users in both VE and PE.</div></div><div><h3>Methods</h3><div>To enhance the user experience and facilitate potential interaction, this paper presents a novel dynamic alignment algorithm for multi-user collaborative redirected walking (DA-RDW) in a shared PE where the target user and other users are moving. This algorithm adopts improved artificial potential fields, where the repulsive force is a function of the relative position and velocity of the user with respect to dynamic obstacles. For the best alignment, this algorithm sets the alignment-guidance force in several cases and then converts it into a constrained optimization problem to obtain the optimal direction. Moreover, this algorithm introduces a potential interaction object selection strategy for a dynamically uncertain environment to speed up the subsequent alignment. To balance obstacle avoidance and alignment, this algorithm uses the dynamic weightings of the virtual and physical distances between users and the target to determine the resultant force vector.</div></div><div><h3>Results</h3><div>The efficacy of the proposed method was evaluated using a series of simulations and live-user experiments. The experimental results demonstrate that our novel dynamic alignment method for multi-user collaborative redirected walking can reduce the distance error in both VE and PE to improve alignment with fewer collisions.</div></div>","PeriodicalId":33538,"journal":{"name":"Virtual Reality Intelligent Hardware","volume":"7 1","pages":"Pages 26-46"},"PeriodicalIF":0.0,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143562755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimizing wireless sensor network topology with node load consideration","authors":"Ruizhi Chen","doi":"10.1016/j.vrih.2024.08.003","DOIUrl":"10.1016/j.vrih.2024.08.003","url":null,"abstract":"<div><h3>Background</h3><div>With the development of the Internet, the topology optimization of wireless sensor networks has received increasing attention. However, traditional optimization methods often overlook the energy imbalance caused by node loads, which affects network performance.</div></div><div><h3>Methods</h3><div>To improve the overall performance and efficiency of wireless sensor networks, a new method for optimizing the wireless sensor network topology based on K-means clustering and firefly algorithms is proposed. The K-means clustering algorithm partitions nodes by minimizing the within-cluster variance, while the firefly algorithm is an optimization algorithm based on swarm intelligence that simulates the flashing interaction between fireflies to guide the search process. The proposed method first introduces the K-means clustering algorithm to cluster nodes and then introduces a firefly algorithm to dynamically adjust the nodes.</div></div><div><h3>Results</h3><div>The results showed that the average clustering accuracies in the Wine and Iris data sets were 86.59% and 94.55%, respectively, demonstrating good clustering performance. When calculating the node mortality rate and network load balancing standard deviation, the proposed algorithm showed dead nodes at approximately 50 iterations, with an average load balancing standard deviation of 1.7×10<sup>4</sup>, proving its contribution to extending the network lifespan.</div></div><div><h3>Conclusions</h3><div>This demonstrates the superiority of the proposed algorithm in significantly improving the energy efficiency and load balancing of wireless sensor networks to extend the network lifespan. The research results indicate that wireless sensor networks have theoretical and practical significance in fields such as monitoring, healthcare, and agriculture.</div></div>","PeriodicalId":33538,"journal":{"name":"Virtual Reality Intelligent Hardware","volume":"7 1","pages":"Pages 47-61"},"PeriodicalIF":0.0,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143562756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Finger tracking for wearable VR glove using flexible rack mechanism","authors":"Roshan Thilakarathna, Maroay Phlernjai","doi":"10.1016/j.vrih.2024.03.001","DOIUrl":"10.1016/j.vrih.2024.03.001","url":null,"abstract":"<div><h3>Background</h3><div>With the increasing prominence of hand and finger motion tracking in virtual reality (VR) applications and rehabilitation studies, data gloves have emerged as a prevalent solution. In this study, we developed an innovative, lightweight, and detachable data glove tailored for finger motion tracking in VR environments.</div></div><div><h3>Methods</h3><div>The glove design incorporates a potentiometer coupled with a flexible rack and pinion gear system, facilitating precise and natural hand gestures for interaction with VR applications. Initially, we calibrated the potentiometer to align with the actual finger bending angle, and verified the accuracy of angle measurements recorded by the data glove. To verify the precision and reliability of our data glove, we conducted repeatability testing for flexion (grip test) and extension (flat test), with 250 measurements each, across five users. We employed the Gage Repeatability and Reproducibility to analyze and interpret the repeatable data. Furthermore, we integrated the gloves into a SteamVR home environment using the OpenGlove auto-calibration tool.</div></div><div><h3>Conclusions</h3><div>The repeatability analysis revealed an aggregate error of 1.45 degrees in both the gripped and flat hand positions. This outcome was notably favorable when compared with the findings from assessments of nine alternative data gloves that employed similar protocols. In these experiments, users navigated and engaged with virtual objects, underlining the glove's exact tracking of finger motion. Furthermore, the proposed data glove exhibited a low response time of 17–34 ms and back-drive force of only 0.19 N. Additionally, according to a comfort evaluation using the Comfort Rating Scales, the proposed glove system is wearable, placing it at the WL1 level.</div></div>","PeriodicalId":33538,"journal":{"name":"Virtual Reality Intelligent Hardware","volume":"7 1","pages":"Pages 1-25"},"PeriodicalIF":0.0,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143562754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Weimin SHI, Yuan XIONG, Qianwen WANG, Han JIANG, Zhong ZHOU
{"title":"FDCPNet:feature discrimination and context propagation network for 3D shape representation","authors":"Weimin SHI, Yuan XIONG, Qianwen WANG, Han JIANG, Zhong ZHOU","doi":"10.1016/j.vrih.2024.06.001","DOIUrl":"10.1016/j.vrih.2024.06.001","url":null,"abstract":"<div><h3>Background</h3><div>Three-dimensional (3D) shape representation using mesh data is essential in various applications, such as virtual reality and simulation technologies. Current methods for extracting features from mesh edges or faces struggle with complex 3D models because edge-based approaches miss global contexts and face-based methods overlook variations in adjacent areas, which affects the overall precision. To address these issues, we propose the Feature Discrimination and Context Propagation Network (FDCPNet), which is a novel approach that synergistically integrates local and global features in mesh datasets.</div></div><div><h3>Methods</h3><div>FDCPNet is composed of two modules: (1) the Feature Discrimination Module, which employs an attention mechanism to enhance the identification of key local features, and (2) the Context Propagation Module, which enriches key local features by integrating global contextual information, thereby facilitating a more detailed and comprehensive representation of crucial areas within the mesh model.</div></div><div><h3>Results</h3><div>Experiments on popular datasets validated the effectiveness of FDCPNet, showing an improvement in the classification accuracy over the baseline MeshNet. Furthermore, even with reduced mesh face numbers and limited training data, FDCPNet achieved promising results, demonstrating its robustness in scenarios of variable complexity.</div></div>","PeriodicalId":33538,"journal":{"name":"Virtual Reality Intelligent Hardware","volume":"7 1","pages":"Pages 83-94"},"PeriodicalIF":0.0,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143562654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}