DisplaysPub Date : 2025-08-22DOI: 10.1016/j.displa.2025.103185
Yanan Guo , Ying Xie , Ying Chang , Benkui Zhang , Kangning Du , Lin Cao
{"title":"Image-based view-dependent appearance for 3D Gaussian splatting","authors":"Yanan Guo , Ying Xie , Ying Chang , Benkui Zhang , Kangning Du , Lin Cao","doi":"10.1016/j.displa.2025.103185","DOIUrl":"10.1016/j.displa.2025.103185","url":null,"abstract":"<div><div>3D Gaussian Splatting (3DGS) has achieved significant progress in the field of novel view synthesis. However, there are challenges associated with using spherical harmonics to learn scenes that involve specular reflections, due to the presence of high-frequency details in such scenes. To address above problem, we propose an image-based view-dependent appearance model to jointly extracts both high- and low-frequency information from the scene, to more efficiently represent the appearance field of 3D Gaussians. Specifically, by statistically assessing the dot product between the view direction and the normal at the respective Gaussian within the image, we develop a view-dependent appearance module that calculates the variances of these dot products; the module is able to adaptively assign weights to both specular and diffuse reflection colors. We propose a normal-guided specular reflections module to extract view-dependent high-frequency information, which effectively filters out specular colors by using a threshold on the variance of the dot product between the view direction and the normal. In addition, to extract low-frequency information, we design an image-based diffuse reflections module to compute the diffuse reflection colors and preserve full-frequency information. Experimental results show that our method outperforms the baseline in both quantitative and qualitative results, significantly enhancing the ability of 3DGS in processing specular reflection scenes.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"91 ","pages":"Article 103185"},"PeriodicalIF":3.4,"publicationDate":"2025-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144932948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
DisplaysPub Date : 2025-08-22DOI: 10.1016/j.displa.2025.103181
Hengjian Gao , Jianfeng Chen , Mohan He , Jingqi Wang , Shukun Wu , Hua Zhong , Bo Jin , Yuan Zhou , Lei Fan
{"title":"UTTBench: Benchmarking Large Multimodal Models for text recognition in underwater thermal turbulent environments","authors":"Hengjian Gao , Jianfeng Chen , Mohan He , Jingqi Wang , Shukun Wu , Hua Zhong , Bo Jin , Yuan Zhou , Lei Fan","doi":"10.1016/j.displa.2025.103181","DOIUrl":"10.1016/j.displa.2025.103181","url":null,"abstract":"<div><div>The rapid advancement of Large Multimodal Models (LMMs) has significantly expanded their potential for complex real-world applications. However, their effectiveness in extreme physical conditions, such as underwater thermal turbulence, remains understudied due to the lack of standardized evaluation benchmarks. To address this gap, we introduce the Underwater Thermal Turbulence Benchmark (UTTBench), the first comprehensive benchmark designed to evaluate text recognition in underwater thermal turbulent environments. We conduct a detailed evaluation of four popular LMMs, including LLaVA-Onevision, Qwen2.5-VL, InternVL 2.5, and DeepSeek-VL2, on this benchmark. Our experiments reveal that even advanced LMMs face substantial challenges in accurately recognizing text under thermal turbulence. This study underscores the critical need for further research to enhance the robustness and reliability of LMMs in such challenging environments.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"91 ","pages":"Article 103181"},"PeriodicalIF":3.4,"publicationDate":"2025-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144917621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
DisplaysPub Date : 2025-08-22DOI: 10.1016/j.displa.2025.103192
Shubin Guo , Ying Chen , Junkang Deng , Huiling Chen , Zhijie Chen , Changle He , Xiaodong Zhu
{"title":"Cascade attention feature residual fusion network for iris localization and segmentation in non-cooperative environments","authors":"Shubin Guo , Ying Chen , Junkang Deng , Huiling Chen , Zhijie Chen , Changle He , Xiaodong Zhu","doi":"10.1016/j.displa.2025.103192","DOIUrl":"10.1016/j.displa.2025.103192","url":null,"abstract":"<div><div>Iris localization and segmentation constitute mission-critical preprocessing stages in iris recognition systems, where their precision directly governs overall recognition accuracy. However, iris images captured under non-cooperative conditions are prone to boundary distortions caused by eyelash or eyelid occlusions and defocus blurring, while texture features suffer from weakened saliency due to uneven illumination or specular reflections, leading to reduced algorithm robustness. To address these challenges, this paper proposes a cascade attention feature residual fusion network (CA-RFNet) for multitask iris localization and segmentation in unconstrained scenarios. CA-RFNet adopts an encoder-decoder structure with skip connections. In the encoder stage, deep convolutional residual blocks hierarchically extract iris texture features. A cascade attention fusion module embedded in skip connections dynamically weights and adaptively integrates multi-receptive-field features while enabling cross-scale information complementarity. The decoder incorporates a boundary perception module with cross-layer feature interaction mechanisms to enhance fine-grained structural perception and cross-hierarchy semantic representation, thereby improving edge prediction accuracy. CA-RFNet modules work collaboratively to overcome adverse effects of unconstrained subject behaviors and complex environmental interference on algorithm robustness in non-cooperative scenarios. Extensive experiments on five non-cooperative iris datasets (CASIA-Iris-Distance, CASIA-Iris-Complex-Occlusion, CASIA-Iris-Complex-Off-angle, CASIA-Iris-M1, and CASIA-Iris-Africa) demonstrate that CA-RFNet achieves superior segmentation and localization performance on challenging samples with complex noise factors including occlusion, off-angle, illumination variation, specular reflection, dark iris, and dark skin.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"91 ","pages":"Article 103192"},"PeriodicalIF":3.4,"publicationDate":"2025-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144912803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
DisplaysPub Date : 2025-08-21DOI: 10.1016/j.displa.2025.103190
Mingyuan Tang , Changli Sun , Weiping Ding , Feng Jiang , Yi Li , Wei Li , Jiangang Lu
{"title":"Full-color augmented reality display via integrated achromatic template polarization volume grating","authors":"Mingyuan Tang , Changli Sun , Weiping Ding , Feng Jiang , Yi Li , Wei Li , Jiangang Lu","doi":"10.1016/j.displa.2025.103190","DOIUrl":"10.1016/j.displa.2025.103190","url":null,"abstract":"<div><div>Augmented reality (AR) displays, position at the forefront of next generation display technology, have received enthusiastic attention by its significantly enhancing in information acquisition capabilities. Liquid crystal (LC) based polarization volume grating (PVG) is widely used as the coupler for its small volume, light weight, simple structure and high efficiency. However, the wavelength limitation and chromatic aberration challenges of PVG limits its application in full-color AR display. To meet the demand for full-color display, an achromatic template PVG (ATPVG) using multi-templating technology is proposed. It realizes achromatic full-color AR display via three PVG templates with different photo alignment patterns. The integrated single-layer device enables large-angle rotating of red, green, and blue incident light with both uniform performance and substantial diffraction efficiency. The comparisons between ATPVG and chromatic PVG in laser diffraction and AR display indicate that the ATPVG exhibits effectively elimination on the chromatic aberration. In addition, the ATPVG enables direct replacement of the couplers in the existing AR devices without system modification.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"91 ","pages":"Article 103190"},"PeriodicalIF":3.4,"publicationDate":"2025-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144932947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
DisplaysPub Date : 2025-08-20DOI: 10.1016/j.displa.2025.103179
Weide Liu , Zhonghua Wu , Henghui Ding , Fayao Liu , Jie Lin , Guosheng Lin , Wei Zhou
{"title":"Exploiting independent query information for few-shot image segmentation","authors":"Weide Liu , Zhonghua Wu , Henghui Ding , Fayao Liu , Jie Lin , Guosheng Lin , Wei Zhou","doi":"10.1016/j.displa.2025.103179","DOIUrl":"10.1016/j.displa.2025.103179","url":null,"abstract":"<div><div>This work addresses the challenging task of few-shot segmentation. Previous few-shot segmentation methods mainly employ the information of support images as guidance for query image segmentation. Although some works propose to build a cross-reference between support and query images, their extraction of query information still depends on the support images. In this paper, we propose to extract the information from the query itself independently to benefit the few-shot segmentation task. To this end, we first propose a prior extractor to learn the query information from the unlabeled images with our proposed global–local contrastive learning. Then, we extract a set of predetermined priors via this prior extractor. With the obtained priors, we generate the prior region maps for query images, which locate the objects, as guidance to perform cross-interaction with support features. In such a way, the extraction of query information is detached from the support branch, overcoming the limitation by support, and could obtain more informative query clues to achieve better interaction. Without bells and whistles, the proposed approach achieves new state-of-the-art performance for the few-shot segmentation task on public datasets.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"91 ","pages":"Article 103179"},"PeriodicalIF":3.4,"publicationDate":"2025-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145003996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
DisplaysPub Date : 2025-08-18DOI: 10.1016/j.displa.2025.103183
Jie Huang , Zhao-Min Chen , Guodao Zhang , Yisu Ge , Huiling Chen
{"title":"Asymmetric dual thresholds and co-occurrence relationship pseudo label generation for partial multi-label task","authors":"Jie Huang , Zhao-Min Chen , Guodao Zhang , Yisu Ge , Huiling Chen","doi":"10.1016/j.displa.2025.103183","DOIUrl":"10.1016/j.displa.2025.103183","url":null,"abstract":"<div><div>In real clinical settings, medical image datasets are often partially annotated due to high labeling costs and complexity, which limits multi-label classification. Existing methods often attempt to tackle this challenge by either decoupling features to generate pseudo-labels or by treating all unknown labels as negative labels during training. In scenarios with severe label scarcity, the former approach may fail to generate high-quality decoupled features, while the latter is prone to introducing label noise. To address these challenges, we propose a novel method for partial multi-label medical image recognition tasks, called Asymmetric Dual Thresholds and Co-occurrence Relationship (ADTCR). Specifically, ADTCR consists of two pseudo-label generation strategies: Asymmetric Dual Threshold (ADT) and Co-occurrence Relationship (CR). The ADT strategy is designed to initially identify pseudo-labels by applying a lower threshold for negative pseudo labels and a higher threshold for positive pseudo labels, ensuring the generation of high-quality pseudo labels. Meanwhile, the CR strategy aims to uncover potential positive labels by capturing label co-occurrence relationships, enabling the detection of latent positive labels among the unknown ones. Finally, to assess the model’s confidence in the generated pseudo-labels, we design a Threshold-based Weighting Loss (TWL), which uses threshold-based weights to weight each pseudo-label, thereby further improving performance. Extensive experiments conducted on three multi-label medical image datasets, i.e., Axial Spondyloarthritis, NIH Chest X-ray 14, ODIR-5K, demonstrate that our method achieves state-of-the-art performance.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"91 ","pages":"Article 103183"},"PeriodicalIF":3.4,"publicationDate":"2025-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144865953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
DisplaysPub Date : 2025-08-14DOI: 10.1016/j.displa.2025.103178
Zelu Qi , Ping Shi , Shuqi Wang , Chaoyang Zhang , Fei Zhao , Zefeng Ying , Da Pan , Xi Yang , Zheqi He , Teng Dai
{"title":"T2VEval: Benchmark dataset and objective evaluation method for T2V-generated videos","authors":"Zelu Qi , Ping Shi , Shuqi Wang , Chaoyang Zhang , Fei Zhao , Zefeng Ying , Da Pan , Xi Yang , Zheqi He , Teng Dai","doi":"10.1016/j.displa.2025.103178","DOIUrl":"10.1016/j.displa.2025.103178","url":null,"abstract":"<div><div>Recent advances in text-to-video (T2V) technology, as demonstrated by models such as Runway Gen-3, Pika, Sora, and Kling, have significantly broadened the applicability and popularity of the technology. This progress has created a growing demand for accurate quality assessment metrics to evaluate the perceptual quality of T2V-generated videos and optimize video generation models. However, assessing the quality of text-to-video outputs remain challenging due to the presence of highly complex distortions, such as unnatural actions and phenomena that defy human cognition. To address these challenges, we constructed T2VEval-Bench, a multi-dimensional benchmark dataset for text-to-video quality evaluation, which contains 148 textual prompts and 1,783 videos generated by 13 T2V models. To ensure a comprehensive evaluation, we scored each video on four dimensions in the subjective experiment, which are overall impression, text–video consistency, realness, and technical quality. Based on T2VEval-Bench, we developed T2VEval, a multi-branch fusion scheme for T2V quality evaluation. T2VEval assesses videos across three branches: text–video consistency, realness, and technical quality. Using an attention-based fusion module, T2VEval effectively integrates features from each branch and predicts scores with the aid of a large language model. Additionally, we implemented a divide-and-conquer training strategy, enabling each branch to learn targeted knowledge while maintaining synergy with the others. Experimental results demonstrate that T2VEval achieves state-of-the-art performance across multiple metrics.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"91 ","pages":"Article 103178"},"PeriodicalIF":3.4,"publicationDate":"2025-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144903356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
DisplaysPub Date : 2025-08-14DOI: 10.1016/j.displa.2025.103177
Xiaoyang Huang, Bingbing Ni, Wenjun Zhang
{"title":"Lat3D: Generating 3D assets in industrial paradigm via lattice deformation","authors":"Xiaoyang Huang, Bingbing Ni, Wenjun Zhang","doi":"10.1016/j.displa.2025.103177","DOIUrl":"10.1016/j.displa.2025.103177","url":null,"abstract":"<div><div>Automatic generation of 3D assets is one of the most promising future applications in AIGC. However, at current time, the prevailing 3D representation in AIGC still has a huge gap with commonly-used 3D design software, which leads to incapability of coherence and collaboration between machine generation and manual operation. To address this issue, we propose a 3D asset creation framework, Lat3D, which focus on Lattice representation that is compatible across mainstream 3D design software. This framework builds on a transformer network and distance-based matching to enable differentiable generation and supervision for lattices. To resolve the problem of biased error expectation in lattice matching, we leverage Importance Sampling to convert the deformed point sets into a uniform distribution. Besides, to activate vanishing lattices during optimization, we explicitly direct the enclosed lattices towards high-error regions by a well-designed distance function. Our framework is capable of producing lattices that are semantically decomposed, systematically structured, and closely aligned with modeling convention. With our developed Blender plugin, the generated lattices could be seamlessly imported into Blender projects for further 3D workflow. We conduct experiments of shape auto-encoding and single-view reconstruction to evaluate the quality of our created 3D assets.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"91 ","pages":"Article 103177"},"PeriodicalIF":3.4,"publicationDate":"2025-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144852562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reversible steganography in cipher domain for JPEG images using polynomial homomorphism","authors":"Shuying Xu , Ji-Hwei Horng , Ching-Chun Chang , Chin-Chen Chang","doi":"10.1016/j.displa.2025.103184","DOIUrl":"10.1016/j.displa.2025.103184","url":null,"abstract":"<div><div>Reversible steganography in the cipher domain enables secret communication while preserving the privacy of the cover media. In this paper, we present a novel reversible steganographic method for JPEG images in cipher domain. The direct current (DC) coefficients are encrypted in a manner that maintains both the JPEG format and file size. The alternating current (AC) coefficients and the secret data are first encoded into codewords over a Galois field using Hamming code. The resulting codewords are then independently encrypted via polynomial-based secret sharing. Subsequently, the shares of secret data are embedded into the shares of AC coefficients using polynomial homomorphism. To ensuring reliability of image reconstruction and secret extraction, a cheating detection code is employed to authenticate the marked shares. Experimental findings demonstrate that our scheme outperforms the state-of-the-art methods in terms of embedding capacity while preserving the file size and format.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"91 ","pages":"Article 103184"},"PeriodicalIF":3.4,"publicationDate":"2025-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144851865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
DisplaysPub Date : 2025-08-12DOI: 10.1016/j.displa.2025.103180
Jianmin Wang , Zhenyu Wang , Qianwen Fu , Ye Wang , Fang You
{"title":"Research on interaction optimization methods for AR games in autonomous driving: Based on contextual awareness and virtual avatar design","authors":"Jianmin Wang , Zhenyu Wang , Qianwen Fu , Ye Wang , Fang You","doi":"10.1016/j.displa.2025.103180","DOIUrl":"10.1016/j.displa.2025.103180","url":null,"abstract":"<div><div>With the continuous development of artificial intelligence and autonomous driving technology, cars are transitioning from traditional transportation to user-centered intelligent mobile spaces. In order to enhance drivers’ road safety awareness and entertainment experience, this study explores the design of multi-modal games in smart cockpits in the context of L3-L4 autonomous driving based on the theory of Situation Awareness (SA). First, using SA and KANO models, key scenarios and design elements are identified, and a method combining context-awareness and avatar design is proposed. Second, typical driving scenarios are identified through user interviews and evaluation scales, and experimental studies are conducted to analyze and refine the multi-modal game interaction information in the smart cockpit environment. Finally, based on Pokémon-style, an AR game design solution suitable for a smart cockpit was developed. The 2 (with and without game interaction) × 4 (vision, vision + hearing, vision + touch, vision + hearing + touch) mixed factorial experimental design was designed, and the proposed design scheme was evaluated using a driving simulator. The experimental results demonstrate that the game interaction design for L3-L4 autonomous driving conditions significantly enhances driver perception of road and environmental information, reduces reaction time in emergencies, improves user experience, and increases trust in the autonomous driving system compared to the non-game interaction design.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"91 ","pages":"Article 103180"},"PeriodicalIF":3.4,"publicationDate":"2025-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144865952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}