Zijing Ma , Zhi Yang , Aihua Mao , Shuyi Wen , Ran Yi , Yongjin Liu
{"title":"A multi-view projection-based object-aware graph network for dense captioning of point clouds","authors":"Zijing Ma , Zhi Yang , Aihua Mao , Shuyi Wen , Ran Yi , Yongjin Liu","doi":"10.1016/j.cag.2024.104156","DOIUrl":"10.1016/j.cag.2024.104156","url":null,"abstract":"<div><div>3D dense captioning has received increasing attention in the multimodal field of 3D vision and language. This task aims to generate a specific descriptive sentence for each object in the 3D scene, which helps build a semantic understanding of the scene. However, due to inevitable holes in point clouds, there are often incorrect objects in the generated descriptions. Moreover, most existing models use KNN to construct relation graphs, which are not robust and have poor adaptability to different scenes. They cannot represent the relationship between the surrounding objects well. To address these challenges, in this paper, we propose a novel multi-level mixed encoding model for accurate 3D dense captioning of objects in point clouds. To handle holes in point clouds, we extract multi-view projection image features of objects based on our key observation that a hole in an object seldom exists in all projection images from different view angles. Then, the image features are fused with object detection features as the input of subsequent modules. Moreover, we combine KNN and DBSCAN clustering algorithms to construct a graph G and fuse their output features subsequently, which ensures the robustness of the graph structure for accurately describing the relationships between objects. Specifically, DBSCAN clusters are formed based on density, which alleviates the problem of using a fixed K value in KNN. Extensive experiments conducted on ScanRefer and Nr3D datasets demonstrate the effectiveness of our proposed model.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"126 ","pages":"Article 104156"},"PeriodicalIF":2.5,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143096905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ruifei Sun, Sulan Zhang, Meihong Su, Lihua Hu, Jifu Zhang
{"title":"Weakly supervised semantic segmentation for ancient architecture based on multiscale adaptive fusion and spectral clustering","authors":"Ruifei Sun, Sulan Zhang, Meihong Su, Lihua Hu, Jifu Zhang","doi":"10.1016/j.cag.2025.104164","DOIUrl":"10.1016/j.cag.2025.104164","url":null,"abstract":"<div><div>Existing methods of weakly supervised semantic segmentation for ancient architecture have several limitations including difficulty in capturing decorative details and achieving precise segmentation boundaries due to the many details and complex shapes of these structures. To mitigate the effect of the above issues in ancient architecture images, this paper proposes a method for weakly supervised semantic segmentation of ancient architecture based on multiscale adaptive fusion and spectral clustering. Specifically, low-level features are able to capture localized details in an image, which helps to identify small objects. In contrast, high-level features can capture the overall shape of an object, making them more effective in recognizing large objects. We use a gating mechanism to adaptively fuse high-level and low-level features in order to retain objects of different sizes. Additionally, by employing spectral clustering, pixels in ancient architectural images can be divided into different regions based on their feature similarities. These regions serve as processing units, providing precise boundaries for class activation map (CAM) and improving segmentation accuracy. Experimental results on the Ancient Architecture, Baroque Architecture, MS COCO 2014 and PASCAL VOC 2012 datasets show that the method outperforms the existing weakly supervised methods, achieving 46.9%, 55.8%, 69.9% and 38.3% in Mean Intersection Over Union (MIOU), respectively. The code is available at <span><span>https://github.com/hao530/MASC.git</span><svg><path></path></svg></span></div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"126 ","pages":"Article 104164"},"PeriodicalIF":2.5,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143096932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CVTLayout: Automated generation of mid-scale commercial space layout via Centroidal Voronoi Tessellation","authors":"Yuntao Wang, Wenming Wu, Yue Fei, Liping Zheng","doi":"10.1016/j.cag.2025.104175","DOIUrl":"10.1016/j.cag.2025.104175","url":null,"abstract":"<div><div>The layout of commercial space is crucial for enhancing user experience and creating business value. However, designing the layout of a mid-scale commercial space remains challenging due to the need to balance rationality, functionality, and safety. In this paper, we propose a novel method that utilizes the Centroidal Voronoi Tessellation (CVT) to generate commercial space layouts automatically. Our method is a multi-level spatial division framework, where at each level, we create and optimize Voronoi diagrams to accommodate complex multi-scale boundaries. We achieve spatial division at different levels by combining the standard Voronoi diagrams with the rectangular Voronoi diagrams. Our method also leverages Voronoi diagrams’ generation controllability and division diversity, offering customized control and diversity generation that previous methods struggled to provide. Extensive experiments and comparisons show that our method offers an automated and efficient solution for generating high-quality commercial space layouts.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"127 ","pages":"Article 104175"},"PeriodicalIF":2.5,"publicationDate":"2025-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143349627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Renata G. Raidou , James B. Procter , Christian Hansen , Thomas Höllt , Daniel Jönsson
{"title":"Foreword to the special section on visual computing for biology and medicine (VCBM 2023)","authors":"Renata G. Raidou , James B. Procter , Christian Hansen , Thomas Höllt , Daniel Jönsson","doi":"10.1016/j.cag.2025.104168","DOIUrl":"10.1016/j.cag.2025.104168","url":null,"abstract":"<div><div>This special section of the Computers and Graphics Journal (C&G) features three articles within the scope of the EG Workshop on Visual Computing for Biology and Medicine, which took place for the 13th time on September 20–22, 2023 in Norrköping, Sweden.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"127 ","pages":"Article 104168"},"PeriodicalIF":2.5,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143156232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Foreword to the Special Section on Smart Tool and Applications for Graphics (STAG 2022)","authors":"Daniela Cabiddu , Teseo Schneider , Gianmarco Cherchi","doi":"10.1016/j.cag.2025.104174","DOIUrl":"10.1016/j.cag.2025.104174","url":null,"abstract":"<div><div>This special issue contains extended and revised versions of the best papers presented at the 9th Conference on Smart Tools and Applications in Graphics (STAG 2022), held in Cagliari, on November 17–18, 2022. Three papers were selected by the appointed members of the Program Committee; extended versions were submitted and further reviewed by external experts. The result is a collection of papers spanning a broad spectrum of topics, from shape analysis and computational geometry to rendering. These include areas such as shape matching, functional maps, and realistic appearance modeling, highlighting cutting-edge advancements and novel approaches in each domain.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"127 ","pages":"Article 104174"},"PeriodicalIF":2.5,"publicationDate":"2025-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143155674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Visualizing NBA information via storylines","authors":"Jie Lin , Chuan-Kai Yang , Chiun-How Kao","doi":"10.1016/j.cag.2025.104169","DOIUrl":"10.1016/j.cag.2025.104169","url":null,"abstract":"<div><div>Sports visualization analysis is an important area within visualization studies. However, there is a lack of tools tailored for NBA writers among existing systems. Creating these tools would improve understanding of the game’s complex dynamics, particularly player interactions.</div><div>We propose a visualization system to improve understanding of complex NBA game data. Featuring multiple modules, it allows users to analyze the game from various perspectives. This paper highlights the system’s use of storylines to examine player interactions, enhancing the extraction of valuable insights. The study shows that our design enhances personalized in-game data analysis, improving the understanding and aiding in identifying critical moments.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"127 ","pages":"Article 104169"},"PeriodicalIF":2.5,"publicationDate":"2025-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143155671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ShadoCookies: Creating user viewpoint-dependent information displays on edible cookies","authors":"Takumi Yamamoto , Takashi Amesaka , Anusha Withana , Yuta Sugiura","doi":"10.1016/j.cag.2024.104158","DOIUrl":"10.1016/j.cag.2024.104158","url":null,"abstract":"<div><div>In this paper, we propose a proof-of-concept fabrication method to transform edible cookies into information displays. The proposed method encodes the surface of cookie dough so that the displayed information changes with the viewpoint. We use a computational design method where small holes are bored into cookie dough at specific angles to create shapes that are only visible from a given perspective. This method allows for selective switching of information depending on the viewpoint position. We investigate the effects of baking time, hole depth, azimuth angle changes on the presented image, and select the appropriate hole spacing based on the number of presented images. Finally, we demonstrate the results and use cases of visualizing information on cookies.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"127 ","pages":"Article 104158"},"PeriodicalIF":2.5,"publicationDate":"2025-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143155595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Peigang Liu, Chenkang Wang, Yecong Wan, Penghui Lei
{"title":"Prompting semantic priors for image restoration","authors":"Peigang Liu, Chenkang Wang, Yecong Wan, Penghui Lei","doi":"10.1016/j.cag.2025.104167","DOIUrl":"10.1016/j.cag.2025.104167","url":null,"abstract":"<div><div>Restoring high-quality clean images from corrupted observations, commonly referred to as image restoration, has been a longstanding challenge in the computer vision community. Existing methods often struggle to recover fine-grained contextual details due to the lack of semantic awareness of the degraded images. To overcome this limitation, we propose a novel prompt-guided semantic-aware image restoration network, termed PSAIR, which can adaptively incorporate and exploit semantic priors of degraded images and reconstruct photographically fine-grained details. Specifically, we exploit the robust degradation filtering and semantic perception capabilities of the segmentation anything model and utilize it to provide non-destructive semantic priors to aid the network’s semantic perception of the degraded images. To absorb the semantic prior, we propose a semantic fusion module that adaptively utilizes the segmentation map to modulate the features of the degraded image thereby facilitating the network to better perceive different semantic regions. Furthermore, considering that the segmentation map does not provide semantic categories, to better facilitate the network’s customized restoration of different semantics we propose a prompt-guided module which dynamically guides the restoration of different semantics via learnable visual prompts. Comprehensive experiments demonstrate that our PSAIR can restore finer contextual details and thus outperforms existing state-of-the-art methods by a large margin in terms of quantitative and qualitative evaluation.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"127 ","pages":"Article 104167"},"PeriodicalIF":2.5,"publicationDate":"2025-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143155672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FreqSpace-NeRF: A fourier-enhanced Neural Radiance Fields method via dual-domain contrastive learning for novel view synthesis","authors":"Xiaosheng Yu , Xiaolei Tian , Jubo Chen , Ying Wang","doi":"10.1016/j.cag.2025.104171","DOIUrl":"10.1016/j.cag.2025.104171","url":null,"abstract":"<div><div>Inspired by Neural Radiance Field’s (NeRF) groundbreaking success in novel view synthesis, current methods mostly employ variants of various deep neural network architectures, and use the combination of multi-scale feature maps with the target viewpoint to synthesize novel views. However, these methods only consider spatial domain features, inevitably leading to the loss of some details and edge information. To address these issues, this paper innovatively proposes the FreqSpace-NeRF (FS-NeRF), aiming to significantly enhance the rendering fidelity and generalization performance of NeRF in complex scenes by integrating the unique advantages of spectral domain and spatial domain deep neural networks, and combining contrastive learning driven data augmentation techniques. Specifically, the core contribution of this method lies in designing a dual-stream network architecture: on one hand, capturing global frequency features through Fourier transformation; on the other hand, finely refining local details using well-established spatial domain convolutional neural networks. Moreover, to ensure the model can more acutely distinguish subtle differences between different views, we propose two loss functions: Frequency-Space Contrastive Entropy Loss (FSCE Loss) and Adaptive Spectral Contrastive Loss (ASC Loss). This innovation aims to more effectively guide the data flow and focuses on minimizing the frequency discrepancies between different views. By comprehensively utilizing the fusion of spectral and spatial domain features along with contrastive learning, FS-NeRF achieves significant performance improvements in scene reconstruction tasks. Extensive qualitative and quantitative evaluations confirm that our method surpasses current state-of-the-art (SOTA) models in this field.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"127 ","pages":"Article 104171"},"PeriodicalIF":2.5,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143155670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mohammed Y. Abbass, H. Kasban, Zeinab F. Elsharkawy
{"title":"Low-light image enhancement via improved lightweight YUV attention network","authors":"Mohammed Y. Abbass, H. Kasban, Zeinab F. Elsharkawy","doi":"10.1016/j.cag.2025.104170","DOIUrl":"10.1016/j.cag.2025.104170","url":null,"abstract":"<div><div>Deep learning approaches have notable results in the area of computer vision applications. Our paper presents improved LYT-Net, a Lightweight YUV Transformer-based models, as an innovative method to improve low-light scenes. Unlike traditional Retinex-based methods, the proposed framework utilizes the chrominance (U and V) and luminance (Y) channels in YUV color-space, mitigating the complexity between color details and light in scenes. LYT-Net provides a thorough contextual realization of the image while keeping architecture burdens low. In order to tackle the issue of weak feature generation of traditional Channel-wise Denoiser (CWD) Block, improved CWD is proposed using Triplet Attention network. Triplet Attention network is exploited to capture both dynamics and static features. Qualitative and quantitative experiments demonstrate that the proposed technique effectively addresses images with varying exposure levels and outperforms state-of-the-art techniques. Furthermore, the proposed technique shows faster computational performance compared to other Retinex-based techniques, promoting it as a suitable option for real-time computer vision topics.</div><div>The source code is available at <span><span>https://github.com/Mohammed-Abbass/YUV-Attention</span><svg><path></path></svg></span></div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"127 ","pages":"Article 104170"},"PeriodicalIF":2.5,"publicationDate":"2025-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143155673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}