Bernardo Marques, Beatriz Sousa Santos, Paulo Dias
{"title":"Ten years of immersive education: Overview of a Virtual and Augmented Reality course at postgraduate level","authors":"Bernardo Marques, Beatriz Sousa Santos, Paulo Dias","doi":"10.1016/j.cag.2024.104088","DOIUrl":"10.1016/j.cag.2024.104088","url":null,"abstract":"<div><div>In recent years, the market has seen the emergence of numerous affordable sensors, interaction devices, and displays, which have greatly facilitated the adoption of Virtual and Augmented Reality (VR/AR) across various applications. However, developing these applications requires a solid understanding of the field and specific technical skills, which are often lacking in current Computer Science and Engineering education programs. This work details an extended version from a Eurographics 2024 Education Paper, reporting a post-graduate-level course that has been taught for the past ten years to almost 200 students, across several Master’s programs. The course introduces students to the fundamental principles, methods, and tools of VR/AR. Its primary objective is to equip students with the knowledge necessary to understand, create, implement, and evaluate applications using these technologies. The paper provides insights into the course structure, key topics covered, assessment methods, as well as the devices and infrastructure utilized. It also includes an overview of various practical projects completed over the years. Among other reflections, we discuss the challenges of teaching this course, particularly due to the rapid evolution of the field, which necessitates constant updates to the curriculum. Finally, future perspectives for the course are outlined.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"124 ","pages":"Article 104088"},"PeriodicalIF":2.5,"publicationDate":"2024-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0097849324002231/pdfft?md5=f05085791d28d06cef00928e6ebd0b31&pid=1-s2.0-S0097849324002231-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142312679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dimeng Zhang , JiaYao Li , Zilong Chen , Yuntao Zou
{"title":"Efficient image generation with Contour Wavelet Diffusion","authors":"Dimeng Zhang , JiaYao Li , Zilong Chen , Yuntao Zou","doi":"10.1016/j.cag.2024.104087","DOIUrl":"10.1016/j.cag.2024.104087","url":null,"abstract":"<div><div>The burgeoning field of image generation has captivated academia and industry with its potential to produce high-quality images, facilitating applications like text-to-image conversion, image translation, and recovery. These advancements have notably propelled the growth of the metaverse, where virtual environments constructed from generated images offer new interactive experiences, especially in conjunction with digital libraries. The technology creates detailed high-quality images, enabling immersive experiences. Despite diffusion models showing promise with superior image quality and mode coverage over GANs, their slow training and inference speeds have hindered broader adoption. To counter this, we introduce the Contour Wavelet Diffusion Model, which accelerates the process by decomposing features and employing multi-directional, anisotropic analysis. This model integrates an attention mechanism to focus on high-frequency details and a reconstruction loss function to ensure image consistency and accelerate convergence. The result is a significant reduction in training and inference times without sacrificing image quality, making diffusion models viable for large-scale applications and enhancing their practicality in the evolving digital landscape.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"124 ","pages":"Article 104087"},"PeriodicalIF":2.5,"publicationDate":"2024-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142319153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alberto Cannavò, Francesco Bottino, Fabrizio Lamberti
{"title":"Supporting motion-capture acting with collaborative Mixed Reality","authors":"Alberto Cannavò, Francesco Bottino, Fabrizio Lamberti","doi":"10.1016/j.cag.2024.104090","DOIUrl":"10.1016/j.cag.2024.104090","url":null,"abstract":"<div><div>Technologies such as chroma-key, LED walls, motion capture (mocap), 3D visual storyboards, and simulcams are revolutionizing how films featuring visual effects are produced. Despite their popularity, these technologies have introduced new challenges for actors. An increased workload is faced when digital characters are animated via mocap, since actors are requested to use their imagination to envision what characters see and do on set. This work investigates how Mixed Reality (MR) technology can support actors during mocap sessions by presenting a collaborative MR system named CoMR-MoCap, which allows actors to rehearse scenes by overlaying digital contents onto the real set. Using a Video See-Through Head Mounted Display (VST-HMD), actors can see digital representations of performers in mocap suits and digital scene contents in real time. The system supports collaboration, enabling multiple actors to wear both mocap suits to animate digital characters and VST-HMDs to visualize the digital contents. A user study involving 24 participants compared CoMR-MoCap to the traditional method using physical props and visual cues. The results showed that CoMR-MoCap significantly improved actors’ ability to position themselves and direct their gaze, and it offered advantages in terms of usability, spatial and social presence, embodiment, and perceived effectiveness over the traditional method.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"124 ","pages":"Article 104090"},"PeriodicalIF":2.5,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142319151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"LightingFormer: Transformer-CNN hybrid network for low-light image enhancement","authors":"Cong Bi , Wenhua Qian , Jinde Cao , Xue Wang","doi":"10.1016/j.cag.2024.104089","DOIUrl":"10.1016/j.cag.2024.104089","url":null,"abstract":"<div><div>Recent deep-learning methods have shown promising results in low-light image enhancement. However, current methods often suffer from noise and artifacts, and most are based on convolutional neural networks, which have limitations in capturing long-range dependencies resulting in insufficient recovery of extremely dark parts in low-light images. To tackle these issues, this paper proposes a novel Transformer-based low-light image enhancement network called LightingFormer. Specifically, we propose a novel Transformer-CNN hybrid block that captures global and local information via mixed attention. It combines the advantages of the Transformer in capturing long-range dependencies and the advantages of CNNs in extracting low-level features and enhancing locality to recover extremely dark parts and enhance local details in low-light images. Moreover, we adopt the U-Net discriminator to enhance different regions in low-light images adaptively, avoiding overexposure or underexposure, and suppressing noise and artifacts. Extensive experiments show that our method outperforms the state-of-the-art methods quantitatively and qualitatively. Furthermore, the application to object detection demonstrates the potential of our method in high-level vision tasks.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"124 ","pages":"Article 104089"},"PeriodicalIF":2.5,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142316195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"APE-GAN: A colorization method for focal areas of infrared images guided by an improved attention mask mechanism","authors":"Wenchao Ren, Liangfu Li, Shiyi Wen, Lingmei Ai","doi":"10.1016/j.cag.2024.104086","DOIUrl":"10.1016/j.cag.2024.104086","url":null,"abstract":"<div><div>Due to their minimal susceptibility to environmental changes, infrared images are widely applicable across various fields, particularly in the realm of traffic. Nonetheless, a common drawback of infrared images lies in their limited chroma and detail information, posing challenges for clear information retrieval. While extensive research has been conducted on colorizing infrared images in recent years, existing methods primarily focus on overall translation without adequately addressing the foreground area containing crucial details. To address this issue, we propose a novel approach that distinguishes and colors the foreground content with important information and the background content with less significant details separately before fusing them into a colored image. Consequently, we introduce an enhanced generative adversarial network based on Attention mask to meticulously translate the foreground content containing vital information more comprehensively. Furthermore, we have carefully designed a new composite loss function to optimize high-level detail generation and improve image colorization at a finer granularity. Detailed testing on IRVI datasets validates the effectiveness of our proposed method in solving the problem of infrared image coloring.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"124 ","pages":"Article 104086"},"PeriodicalIF":2.5,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142319152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wenshu Li , Yinliang Chen , Xiaoying Guo , Xiaoyu He
{"title":"ST2SI: Image Style Transfer via Vision Transformer using Spatial Interaction","authors":"Wenshu Li , Yinliang Chen , Xiaoying Guo , Xiaoyu He","doi":"10.1016/j.cag.2024.104084","DOIUrl":"10.1016/j.cag.2024.104084","url":null,"abstract":"<div><div>While retaining the original content structure, image style transfer uses style image to render it to obtain stylized images with artistic features. Because the content image contains different detail units and the style image has various style patterns, it is easy to cause the distortion of the stylized image. We proposes a new Style Transfer based on Vision Transformer using Spatial Interaction (ST2SI), which takes advantage of Spatial Interactive Convolution (SIC) and Spatial Unit Attention (SUA) to further enhance the content and style representation, so that the encoder can not only better learn the features of the content domain and the style domain, but also maintain the structural integrity of the image content and the effective integration of style features. Concretely, the high-order spatial interaction ability of Spatial Interactive Convolution can capture complex style patterns, and Spatial Unit Attention can balance the content information of different detail units through the change of attention weight, thus solving the problem of image distortion. Comprehensive qualitative and quantitative experiments prove the efficacy of our approach.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"124 ","pages":"Article 104084"},"PeriodicalIF":2.5,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142312678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Carsten Görg , Connor Elkhill , Jasmine Chaij , Kristin Royalty , Phuong D. Nguyen , Brooke French , Ines A. Cruz-Guerrero , Antonio R. Porras
{"title":"SHAPE: A visual computing pipeline for interactive landmarking of 3D photograms and patient reporting for assessing craniosynostosis","authors":"Carsten Görg , Connor Elkhill , Jasmine Chaij , Kristin Royalty , Phuong D. Nguyen , Brooke French , Ines A. Cruz-Guerrero , Antonio R. Porras","doi":"10.1016/j.cag.2024.104056","DOIUrl":"10.1016/j.cag.2024.104056","url":null,"abstract":"<div><div>3D photogrammetry is a cost-effective, non-invasive imaging modality that does not require the use of ionizing radiation or sedation. Therefore, it is specifically valuable in pediatrics and is used to support the diagnosis and longitudinal study of craniofacial developmental pathologies such as craniosynostosis — the premature fusion of one or more cranial sutures resulting in local cranial growth restrictions and cranial malformations. Analysis of 3D photogrammetry requires the identification of craniofacial landmarks to segment the head surface and compute metrics to quantify anomalies. Unfortunately, commercial 3D photogrammetry software requires intensive manual landmark placements, which is time-consuming and prone to errors. We designed and implemented SHAPE, a System for Head-shape Analysis and Pediatric Evaluation. It integrates our previously developed automated landmarking method in a visual computing pipeline to evaluate a patient’s 3D photogram while allowing for manual confirmation and correction. It also automatically computes advanced metrics to quantify craniofacial anomalies and automatically creates a report that can be uploaded to the patient’s electronic health record. We conducted a user study with a professional clinical photographer to compare SHAPE to the existing clinical workflow. We found that SHAPE allows for the evaluation of a craniofacial 3D photogram more than three times faster than the current clinical workflow (<span><math><mrow><mn>3</mn><mo>.</mo><mn>85</mn><mo>±</mo><mn>0</mn><mo>.</mo><mn>99</mn></mrow></math></span> vs. <span><math><mrow><mn>13</mn><mo>.</mo><mn>07</mn><mo>±</mo><mn>5</mn><mo>.</mo><mn>29</mn></mrow></math></span> minutes, <span><math><mrow><mi>p</mi><mo><</mo><mn>0</mn><mo>.</mo><mn>001</mn></mrow></math></span>). Our qualitative study findings indicate that the SHAPE workflow is well aligned with the existing clinical workflow and that SHAPE has useful features and is easy to learn.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"125 ","pages":"Article 104056"},"PeriodicalIF":2.5,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142525947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Peng Zhang , Jiamei Zhan , Kexin Sun , Jie Zhang , Meng Wei , Kexin Wang
{"title":"GIC-Flow: Appearance flow estimation via global information correlation for virtual try-on under large deformation","authors":"Peng Zhang , Jiamei Zhan , Kexin Sun , Jie Zhang , Meng Wei , Kexin Wang","doi":"10.1016/j.cag.2024.104071","DOIUrl":"10.1016/j.cag.2024.104071","url":null,"abstract":"<div><p>The primary aim of image-based virtual try-on is to seamlessly deform the target garment image to align with the human body. Owing to the inherent non-rigid nature of garments, current methods prioritise flexible deformation through appearance flow with high degrees of freedom. However, existing appearance flow estimation methods solely focus on the correlation of local feature information. While this strategy successfully avoids the extensive computational effort associated with the direct computation of the global information correlation of feature maps, it leads to challenges in garments adapting to large deformation scenarios. To overcome these limitations, we propose the GIC-Flow framework, which obtains appearance flow by calculating the global information correlation while reducing computational regression. Specifically, our proposed global streak information matching module is designed to decompose the appearance flow into horizontal and vertical vectors, effectively propagating global information in both directions. This innovative approach considerably diminishes computational requirements, contributing to an enhanced and efficient process. In addition, to ensure the accurate deformation of local texture in garments, we propose the local aggregate information matching module to aggregate information from the nearest neighbours before computing the global correlation and to enhance weak semantic information. Comprehensive experiments conducted using our method on the VITON and VITON-HD datasets show that GIC-Flow outperforms existing state-of-the-art algorithms, particularly in cases involving complex garment deformation.</p></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"124 ","pages":"Article 104071"},"PeriodicalIF":2.5,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142229691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chuan Jin , Tieru Wu , Yu-Shen Liu , Junsheng Zhou
{"title":"MuSic-UDF: Learning Multi-Scale dynamic grid representation for high-fidelity surface reconstruction from point clouds","authors":"Chuan Jin , Tieru Wu , Yu-Shen Liu , Junsheng Zhou","doi":"10.1016/j.cag.2024.104081","DOIUrl":"10.1016/j.cag.2024.104081","url":null,"abstract":"<div><p>Surface reconstruction for point clouds is a central task in 3D modeling. Recently, the attractive approaches solve this problem by learning neural implicit representations, e.g., unsigned distance functions (UDFs), from point clouds, which have achieved good performance. However, the existing UDF-based methods still struggle to recover the local geometrical details. One of the difficulties arises from the used inflexible representations, which is hard to capture the local high-fidelity geometry details. In this paper, we propose a novel neural implicit representation, named MuSic-UDF, which leverages <strong>Mu</strong>lti-<strong>S</strong>cale dynam<strong>ic</strong> grids for high-fidelity and flexible surface reconstruction from raw point clouds with arbitrary typologies. Specifically, we initialize a hierarchical voxel grid where each grid point stores a learnable 3D coordinate. Then, we optimize these grids such that different levels of geometry structures can be captured adaptively. To further explore the geometry details, we introduce a frequency encoding strategy to hierarchically encode these coordinates. MuSic-UDF does not require any supervisions like ground truth distance values or point normals. We conduct comprehensive experiments under widely-used benchmarks, where the results demonstrate the superior performance of our proposed method compared to the state-of-the-art methods.</p></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"124 ","pages":"Article 104081"},"PeriodicalIF":2.5,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142241025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}