DisplaysPub Date : 2025-01-06DOI: 10.1016/j.displa.2024.102955
Lei Wang, Qingbo Wu, Fanman Meng, Zhengning Wang, Chenhao Wu, Haoran Wei, King Ngi Ngan
{"title":"Scoring structure regularized gradient boosting network for blind image quality assessment","authors":"Lei Wang, Qingbo Wu, Fanman Meng, Zhengning Wang, Chenhao Wu, Haoran Wei, King Ngi Ngan","doi":"10.1016/j.displa.2024.102955","DOIUrl":"10.1016/j.displa.2024.102955","url":null,"abstract":"<div><div>Blind image quality assessment (BIQA) aims to quantitatively predict the subjective perception of the distorted image without accessing its corresponding clean version. Prevailing methods typically model BIQA as a regression task and strive to minimize the average prediction error in terms of the pointwise unstructured loss, such as Mean Square Error (MSE) or Mean Absolute Error (MAE), which ignores the perception toward the rank orders and perceptual differences between different images. This paper proposes a Scoring Structure regularized Gradient Boosting Network (SSGB-Net) to achieve a more comprehensive perception across all distorted images. More specifically, our SSGB-Net performs BIQA in three stages, pair-wise rectification and list-wise boosting, followed by point-wise prediction after linear transformation. First, we correct the initial scores by incorporating the structured pairwise loss, i.e., SoftRank, to preserve the perceptual rank orders of pairwise images. Then, we further boost the previous pairwise correction results with structured listwise loss, i.e., Norm-in-Norm, to maintain the perceptual difference across all images. Finally, the point-wise prediction measures the MSE between the transformed scores and the ground truth through a closed-form solution of the Exponential Moving Average (EMA) driven linear transformation. Based on these iterative corrections, our SSGB-Net can effectively balance multiple BIQA objectives and outperform many state-of-the-art methods in terms of Pearson Linear Correlation Coefficient (PLCC), Spearman Rank Correlation Coefficient (SRCC) and Root Mean Squared Error (RMSE).</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"87 ","pages":"Article 102955"},"PeriodicalIF":3.7,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143163100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
DisplaysPub Date : 2025-01-04DOI: 10.1016/j.displa.2024.102960
Can Zhang , Feipeng Da , Shaoyan Gai
{"title":"Enhancing 3D Visual Grounding with Deformable Attention Transformer and Geometry Affine Transformation: Overcoming sparsity challenges","authors":"Can Zhang , Feipeng Da , Shaoyan Gai","doi":"10.1016/j.displa.2024.102960","DOIUrl":"10.1016/j.displa.2024.102960","url":null,"abstract":"<div><div>In this paper, we introduce 3DVG-Deformable-Attention Transformer (3DVG-DT), a novel framework designed to address the challenge of imprecise target object localization in 3D Visual Grounding (3DVG) due to point cloud sparsity. By integrating Deformable Attention Transformer (DAT) and Geometry Affine Transformation (GAT), 3DVG-DT effectively mitigates the effects of point cloud sparsity and irregularity, significantly improving 3DVG accuracy. We propose a Dual-Mode Feature Fusion (DMF) module for object detection and matching within complex point clouds, while a Description-aware Keypoint Affine Transformation Sampling (DKAS) strategy further enhances performance. Leveraging DeBERTa-V3 for language encoding, we demonstrate the effectiveness of 3DVG-DT on ScanRefer and Referit3D datasets, showcasing improved target detection capabilities under sparse point cloud conditions. Experimental results reveal substantial gains over existing methods, particularly in handling sparse point clouds.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"87 ","pages":"Article 102960"},"PeriodicalIF":3.7,"publicationDate":"2025-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143163094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
DisplaysPub Date : 2025-01-03DOI: 10.1016/j.displa.2024.102946
Jianping Cui , Liang Yuan , Wendong Xiao , Teng Ran , Li He , Jianbo Zhang
{"title":"SEAE: Stable end-to-end autonomous driving using event-triggered attention and exploration-driven deep reinforcement learning","authors":"Jianping Cui , Liang Yuan , Wendong Xiao , Teng Ran , Li He , Jianbo Zhang","doi":"10.1016/j.displa.2024.102946","DOIUrl":"10.1016/j.displa.2024.102946","url":null,"abstract":"<div><div>In self-driving cars, significant losses can be caused by various unstable factors. Thus, the use of the reinforcement learning self-driving technology with stability constraints is essential. The proposed multi-input stable autonomous driving based on exploration-driven with attention and event-triggered (SEAE) helps the agent program better autonomous driving with stability. This paper optimizes the input information processing of deep reinforcement learning using the multi-head self-attention mechanism, enhances the spatial exploration ability of the agent using the exploration-driven network. It combines the acceleration stability with the event-triggered mechanism to ensure a high driving safety while taking the driving stability into account. More precisely, the proposed multi-input approach treats the instantaneous acceleration as a constraint specified by the agent and optimizes the reward function, while taking into consideration the rate of motion change. Weights are then assigned to the data sequences through a multi-head self-attention mechanism, allowing the agent to focus on the environmental information part that is more important for the autonomous driving task received by the sensors. In addition, the proposed multi-input SEAE method is compatible with SAC and DDPG algorithms to verify its effectiveness in driving stability. The obtained results show that the proposed method has the highest performance in average reward value, average episode length, driving speed and driving stability in complex scenarios of autonomous driving tasks.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"87 ","pages":"Article 102946"},"PeriodicalIF":3.7,"publicationDate":"2025-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143163097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
DisplaysPub Date : 2025-01-03DOI: 10.1016/j.displa.2024.102952
Shangbo Yang, Chaofeng Li, Guanghua Fu
{"title":"IGC-Net: Integrating gated mechanism and complex-valued convolutions network for overwater object detection","authors":"Shangbo Yang, Chaofeng Li, Guanghua Fu","doi":"10.1016/j.displa.2024.102952","DOIUrl":"10.1016/j.displa.2024.102952","url":null,"abstract":"<div><div>In real-world overwater scenarios, detecting occluded or distant objects is common challenges. In this paper, we initially construct a novel dataset SeaShips24790 for evaluating the performance of overwater object detectors, which includes 24,790 diverse overwater object annotations, especially focusing on small-scale objects. Subsequently, a new deep-learning network that integrates gated mechanism and complex-valued convolutions, termed IGC-Net, is proposed to tackle the challenges of object occlusion and small object detection in overwater scenarios. It employs the gating mechanism to selectively enhance or suppress features and incorporates complex-valued modules, including complex-valued convolutions, for fusing multi-scale feature maps. Additionally, a two-stage multi-scale feature fusion is used, comprising pre-fusion and post-fusion stages. Experimental results demonstrate that our proposed IGC-Net achieves state-of-the-art (SOTA) performance across several overwater object detection datasets. The SeaShips24790 dataset will be made available as requested.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"87 ","pages":"Article 102952"},"PeriodicalIF":3.7,"publicationDate":"2025-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143163098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
DisplaysPub Date : 2025-01-02DOI: 10.1016/j.displa.2024.102958
Kerim Serdar Sungur, Gokhan Bakal
{"title":"Beyond visual cues: Emotion recognition in images with text-aware fusion","authors":"Kerim Serdar Sungur, Gokhan Bakal","doi":"10.1016/j.displa.2024.102958","DOIUrl":"10.1016/j.displa.2024.102958","url":null,"abstract":"<div><div>Sentiment analysis is a widely studied problem for understanding human emotions and potential outcomes. As it can be performed over textual data, working on visual data elements is also critically substantial to examining the current emotional status. In this effort, the aim is to investigate any potential enhancements in sentiment analysis predictions through visual instances by integrating textual data as additional knowledge reflecting the contextual information of the images. Thus, two separate models have been developed as image-processing and text-processing models in which both models were trained on distinct datasets comprising the same five human emotions. Following, the outputs of the individual models’ last dense layers are combined to construct the hybrid multimodel empowered by visual and textual components. The fundamental focus is to evaluate the performance of the hybrid model in which the textual knowledge is concatenated with visual data. Essentially, the hybrid model achieved nearly a 3% F1-score improvement compared to the plain image classification model utilizing convolutional neural network architecture. In essence, this research underscores the potency of fusing textual context with visual information to refine sentiment analysis predictions. The findings not only emphasize the potential of a multi-modal approach but also spotlight a promising avenue for future advancements in emotion analysis and understanding.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"87 ","pages":"Article 102958"},"PeriodicalIF":3.7,"publicationDate":"2025-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143163089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
DisplaysPub Date : 2025-01-02DOI: 10.1016/j.displa.2024.102954
XiaoPeng Hu , Kang Liu , Xiangchen Yin , Xin Gao , Pingsheng Jiang , Xu Qian
{"title":"Wavelet-based enhancement network for low-light image","authors":"XiaoPeng Hu , Kang Liu , Xiangchen Yin , Xin Gao , Pingsheng Jiang , Xu Qian","doi":"10.1016/j.displa.2024.102954","DOIUrl":"10.1016/j.displa.2024.102954","url":null,"abstract":"<div><div>Low-light images are key challenges for high-level vision tasks, often leading to failures in intelligent systems. To achieve more robust low-light enhancement and gain improvement for downstream segmentation task, in this paper we propose a wavelet-based enhancement network (WENet) that combines convolution layer and Transformer block. The wavelet transform separates different frequency components from the multi-scale transformation of the signal. We propose a wavelet calibrate layer (WCL), which converts the feature to the wavelet domain and distributes it to the corresponding area through multiple calibration filters, and restores details of the image. Recognizing that noise amplification occurs concurrently with wavelet learning, we build a contrast adjustment layer (CAL), which refines the contrast primarily through shift operations. WENet has achieved superior performance on the LOL, LOLv2 and MIT-Adobe FiveK datasets for enjoyable visual experience, reaching 22.34 and 0.814 on PSNR and SSIM respectively. We trained WENet and segmentation model by end-to-end in the dark scene of ACDC dataset and achieved advanced effect, which is robust for low-light scenes.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"87 ","pages":"Article 102954"},"PeriodicalIF":3.7,"publicationDate":"2025-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143163093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
DisplaysPub Date : 2025-01-02DOI: 10.1016/j.displa.2024.102948
Huilin Wang, Huaming Qian
{"title":"SDG-YOLOv8: Single-domain generalized object detection based on domain diversity in traffic road scenes","authors":"Huilin Wang, Huaming Qian","doi":"10.1016/j.displa.2024.102948","DOIUrl":"10.1016/j.displa.2024.102948","url":null,"abstract":"<div><div>Object detection is a fundamental task of environment perception in traffic road scenarios, and its accurate detection results are of great significance for improving the reliability of autonomous driving, improving public transportation services, and detecting traffic violations. However, the problem of domain offset between different traffic road scenarios leads to a poor generalization of the target detector. To overcome this problem, we propose a single-domain generalized object detection algorithm SDG-YOLOv8 based on domain diversity. First, we designed a local–global transformation module to transform the source domain into an auxiliary domain with the same annotations, increasing the domain diversity of the training data at the image level. Second, we design a normalization perturbation fusion module to implicitly change the input image’s style and increase the training data’s domain diversity in the feature space. Finally, we design an effective training loss function that further reduces the sensitivity of the detection model to domain offsets and improves the generalization ability of the target detector to access the unknown target domain. We conducted experiments on multiple datasets containing different weather, different cities, and virtual-to-reality, and our method significantly improves the detection accuracy for unknown target domains and outperforms other domain generalized object detection algorithms.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"87 ","pages":"Article 102948"},"PeriodicalIF":3.7,"publicationDate":"2025-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143163090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
DisplaysPub Date : 2025-01-02DOI: 10.1016/j.displa.2024.102953
Qian Chen , Lvhai Chen , Wenjie Nie , Xudong Li , Jingyuan Zheng , Jiajun Zhong , Yihua Wei , Yan Zhang , Rongrong Ji
{"title":"A mixed-scale dynamic attention transformer for pediatric pneumonia diagnosis","authors":"Qian Chen , Lvhai Chen , Wenjie Nie , Xudong Li , Jingyuan Zheng , Jiajun Zhong , Yihua Wei , Yan Zhang , Rongrong Ji","doi":"10.1016/j.displa.2024.102953","DOIUrl":"10.1016/j.displa.2024.102953","url":null,"abstract":"<div><div>Pediatric pneumonia is a leading cause of morbidity and mortality in children under five, emphasizing the urgent need for automated diagnostic systems. While deep learning has shown promise in natural image classification, pediatric pneumonia imaging presents unique challenges, such as subtle symptoms, smaller anatomical structures, and the need for fine-grained feature extraction. To address this, We propose a Mixed-Scale Dynamic Attention Transformer aided by large language models (LLMs), which consists of three key modules: (1) Dynamic Local Attention Module: Dynamically focuses on nearby regions with fine-grained attention and applies coarse-grained attention to distant areas, effectively capturing both local and global spatial dependencies. (2) Hierarchical Multi-Scale Unit Module: Integrates and enhances multi-scale channel information, adapting to varying spatial scales to better detect subtle pneumonia-related features. (3) Attention Amplification Module: Leverages a frozen large language model (e.g., GPT, LLaMA) to amplify attention on critical pneumonia features by utilizing its rich semantic insights and global contextual understanding. Evaluations on pediatric chest X-ray datasets, including Pneumonia Physician, Guangzhou Women and Children’s Medical Center, and NIH CXR14, demonstrate the proposed method’s superior performance across key metrics such as accuracy, AUC, precision, recall, and F1-score.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"87 ","pages":"Article 102953"},"PeriodicalIF":3.7,"publicationDate":"2025-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143163099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
DisplaysPub Date : 2024-12-31DOI: 10.1016/j.displa.2024.102963
Hao Liu , Renhui Sun
{"title":"Uniform-reference threshold-dynamic skipping for video compressive sensing","authors":"Hao Liu , Renhui Sun","doi":"10.1016/j.displa.2024.102963","DOIUrl":"10.1016/j.displa.2024.102963","url":null,"abstract":"<div><div>Block-based Compressive Sensing (BCS) can complete the compression of original signal during the sampling process, and thus reduce the computational burden at the encoder. BCS is suitable for some scenarios where encoding-end resources are limited, such as applications in the field of drone photography. For video signals, there is a high similarity between adjacent frames, so some researchers have proposed to perform block skipping at the encoder under the GOP-BCS framework to further compress the data that needs to be transmitted to the decoder. The reference and selection of skip-blocks are related to the reconstruction quality of the decoder and the compression ratio at the encoder. This paper proposes a Uniform-reference Threshold-dynamic Skipping (UTS) algorithm. Firstly, the proposed algorithm sets a dynamic threshold to select skip-blocks, which is suitable for video sequences with different motion variations. Secondly, in a general GOP framework, keyframes and prime non-keyframe in the middle are used as reference frames, so that the reference frames are uniformly distributed, which can provide accurate skip-block reference for more non-keyframes. At the same time, a high threshold is set for the prime non-keyframe to select skip-blocks to ensure its reliability as a reference frame and further improve the skipping ratio. The experimental results show that compared with the state-of-the-art algorithms, the proposed algorithm has a higher skipping ratio and effectively reduce the energy consumption of signal transmission when the same reconstruction quality is required at the decoder.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"87 ","pages":"Article 102963"},"PeriodicalIF":3.7,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143163079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Crypto-space steganography for 3D mesh models with greedy selection and shortest expansion","authors":"Kai Gao , Ji-Hwei Horng , Ching-Chun Chang , Chin-Chen Chang","doi":"10.1016/j.displa.2024.102961","DOIUrl":"10.1016/j.displa.2024.102961","url":null,"abstract":"<div><div>Data hiding in encrypted 3D mesh models has emerged as a promising crypto-space steganography technique. However, the existing methods have the potential to improve embedding capacity due to the underutilization of the model’s topological features. In this paper, we propose an innovative greedy selection and shortest expansion strategy to select a proper reference set of vertices. Subsequently, the multi-MSB prediction and entropy coding are leveraged to further reduce the redundancy in the vertex coordinates for data embedding. By combining the new strategy and the efficient compressing of the embeddable vertices, we can raise the vertex utilization rate to approximately 90%. Experimental results show that our proposed scheme outperforms state-of-the-art methods, offering a substantial improvement in data payload for reversible data hiding in encrypted 3D mesh models.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"87 ","pages":"Article 102961"},"PeriodicalIF":3.7,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143163088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}