DisplaysPub Date : 2025-02-14DOI: 10.1016/j.displa.2025.102993
Zaka-Ud-Din Muhammad , Zhangjin Huang , Naijie Gu
{"title":"PCRNet: Parent–Child Relation Network for automatic polyp segmentation","authors":"Zaka-Ud-Din Muhammad , Zhangjin Huang , Naijie Gu","doi":"10.1016/j.displa.2025.102993","DOIUrl":"10.1016/j.displa.2025.102993","url":null,"abstract":"<div><div>Colorectal cancer (CRC) is the third most common cancer worldwide in terms of both incidence and mortality rates. On the other hand, its slow development process is very beneficial for early diagnosis and effective treatment strategies in reducing mortality rates. Colonoscopy is considered the standard approach for early diagnosis and treatment of the disease. However, detecting early-stage polyps remains challenging with the current standard colonoscopy approach due to the diverse shapes, sizes, and camouflage properties of polyps.</div><div>To address the issues posed by the different shapes, sizes, colors, and hazy boundaries of polyps, we propose the Parent–Child Relation Encoder Network (PCRNet), a lightweight model for automatic polyp segmentation. PCRNet comprises a parent–child encoder branch and a decoder branch equipped with a set of Boundary-aware Foreground Extraction Blocks (BFEB). The child encoder is designed to enhance feature representation while considering model size and computational complexity. The BFEB is introduced to accurately segment polyps of varying shapes and sizes by effectively handling the issue of hazy boundaries.</div><div>PCRNet is evaluated both quantitatively and qualitatively on five public datasets, demonstrating its effectiveness compared to more than a dozen state-of-the-art techniques. Our model is the most lightweight among current approaches, with only (5.0087) million parameters, and achieves the best Dice Score of (0.729%) on the most challenging dataset, ETIS. PCRNet also has an average inference rate of (36.5) fps on an <span><math><mrow><mi>I</mi><mi>n</mi><mi>t</mi><mi>e</mi><mi>l</mi></mrow></math></span>® <span><math><mrow><mi>C</mi><mi>o</mi><mi>r</mi><msup><mrow><mi>e</mi></mrow><mrow><mi>T</mi><mi>M</mi></mrow></msup></mrow></math></span> i7-10700K CPU with 62 GB of memory, using a GeForce RTX 3080 (10 GB).</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"88 ","pages":"Article 102993"},"PeriodicalIF":3.7,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143445832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
DisplaysPub Date : 2025-02-10DOI: 10.1016/j.displa.2025.102983
Zhihua Shen , Fei Li , Yiqiang Wu , Xiaomao Li
{"title":"Ghost-free high dynamic range imaging with shift convolution and streamlined channel transformer","authors":"Zhihua Shen , Fei Li , Yiqiang Wu , Xiaomao Li","doi":"10.1016/j.displa.2025.102983","DOIUrl":"10.1016/j.displa.2025.102983","url":null,"abstract":"<div><div>High dynamic range (HDR) imaging merges multiple low dynamic range (LDR) images to generate an image with a wider dynamic range and more authentic details. However, existing HDR algorithms often produce residual ghosts due to challenges in capturing long-range dependencies in scenes with large motion and severe saturation. To address these issues, we propose an HDR deghosting method with shift convolution and a streamlined channel Transformer (SCHDRNet). Specifically, to better aggregate information across frames, we propose a pixel-shift alignment module (PSAM) to enhance the interaction of adjacent pixel features through shift convolution, improving the accuracy of the attention alignment module (AAM). Additionally, we propose a hierarchical streamlined channel Transformer (SCT) that integrates streamlined channel attention, multi-head self-attention, and channel attention blocks. This architecture effectively captures both global and local context, reducing ghosting from large motions and blurring from small movements. Extensive experiments demonstrate that our method minimizes ghosting artifacts and excels in quantitative and qualitative aspects.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"87 ","pages":"Article 102983"},"PeriodicalIF":3.7,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143421119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
DisplaysPub Date : 2025-02-09DOI: 10.1016/j.displa.2025.102994
Wanxuan Geng , Junfan Yi , Liang Cheng
{"title":"An efficient detector for maritime search and rescue object based on unmanned aerial vehicle images","authors":"Wanxuan Geng , Junfan Yi , Liang Cheng","doi":"10.1016/j.displa.2025.102994","DOIUrl":"10.1016/j.displa.2025.102994","url":null,"abstract":"<div><div>Unmanned aerial vehicle (UAV) remote sensing has the advantages of responsive and high image resolution, which can better serve the object detection of maritime search and rescue (SAR). However, there are still some obstacles in maritime SAR object detection based on UAV images, due to the lack of samples for training and the complexity background of the maritime images. In this study, we build a maritime search and rescue target dataset (MSRTD) based on UAV images and further propose an efficient multi-category detector named Maritime Search and Rescue-You Only Look Once network (MSR-YOLO). To eliminate the influence of objects scale and shooting angle, we introduce the deformable convolution network (DCN) to modules in backbone. The Coordinated Attention (CA) is added to the neck of network to extract the powerful features. We replace the original detection head with decoupled detection head to better complete the task of object recognition and localization. Finally, we use Wise-Intersection over Union loss (WIoU) during the training to reduce the influence of the samples quality and help model converges rapidly. The experiments on MSRTD confirm that the proposed MSR-YOLO achieves precision, recall, and mean average precision (mAP) (0.5) of 90.00%, 68.52%, and 79.98% respectively. Compared with other methods on public dataset, ours also performs well and provides an effective detector model for maritime SAR object detection based on UAV images.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"87 ","pages":"Article 102994"},"PeriodicalIF":3.7,"publicationDate":"2025-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143421120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
DisplaysPub Date : 2025-02-08DOI: 10.1016/j.displa.2025.102990
Jiantao Zhang , Bojun Ren , Yicheng Fu , Rongbo Ma , Zinuo Cai , Weishan Zhang , Ruhui Ma , Jinshan Sun
{"title":"HyperTuneFaaS: A serverless framework for hyperparameter tuning in image processing models","authors":"Jiantao Zhang , Bojun Ren , Yicheng Fu , Rongbo Ma , Zinuo Cai , Weishan Zhang , Ruhui Ma , Jinshan Sun","doi":"10.1016/j.displa.2025.102990","DOIUrl":"10.1016/j.displa.2025.102990","url":null,"abstract":"<div><div>Deep learning has achieved remarkable success across various fields, especially in image processing tasks like denoising, sharpening, and contrast enhancement. However, the performance of these models heavily relies on the careful selection of hyperparameters, which can be a computationally intensive and time-consuming task. Cloud-based hyperparameter search methods have gained popularity due to their ability to address the inefficiencies of single-machine training and the underutilization of computing resources. Nevertheless, these methods still encounters substantial challenges, including high computational demands, parallelism requirements, and prolonged search time.</div><div>In this study, we propose <span>HyperTuneFaaS</span>, a Function as a Service (FaaS)-based hyperparameter search framework that leverages distributed computing and asynchronous processing to tackle the issues encountered in hyperparameter search. By fully exploiting the parallelism offered by serverless computing, <span>HyperTuneFaaS</span> minimizes the overhead typically associated with model training on serverless platforms. Additionally, we enhance the traditional genetic algorithm, a powerful metaheuristic method, to improve its efficiency and integrate it with the framework to enhance the efficiency of hyperparameter tuning. Experimental results demonstrate significant improvements in efficiency and cost savings with the combination of the FaaS-based hyperparameter tuning framework and the optimized genetic algorithm, making <span>HyperTuneFaaS</span> a powerful tool for optimizing image processing models and achieving superior image quality.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"87 ","pages":"Article 102990"},"PeriodicalIF":3.7,"publicationDate":"2025-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143395684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
DisplaysPub Date : 2025-02-07DOI: 10.1016/j.displa.2025.102987
Yunhao Li , Sijing Wu , Yucheng Zhu , Wei Sun , Zhichao Zhang , Song Song , Guangtao Zhai
{"title":"SAMR: Symmetric masked multimodal modeling for general multi-modal 3D motion retrieval","authors":"Yunhao Li , Sijing Wu , Yucheng Zhu , Wei Sun , Zhichao Zhang , Song Song , Guangtao Zhai","doi":"10.1016/j.displa.2025.102987","DOIUrl":"10.1016/j.displa.2025.102987","url":null,"abstract":"<div><div>Recently, text to 3d human motion retrieval has been a hot topic in computer vision. However, current existing methods utilize contrastive learning and motion reconstruction as the main proxy task. Although these methods achieve great performance, such simple strategies may cause the network to lose temporal motion information and distort the text feature, which may injury motion retrieval results. Meanwhile, current motion retrieval methods ignore the post processing for predicted similarity matrices. Considering these two problems, in this work, we present <strong>SAMR</strong>, an encoder–decoder based transformer framework with symmetric masked multi-modal information modeling. Concretely, we remove the KL divergence loss and reconstruct the motion and text inputs jointly. To enhance the robustness of our retrieval model, we also propose a mask modeling strategy. Our SAMR performs joint masking on both image and text inputs, during training, for each modality, we simultaneously reconstruct the original input modality and masked modality to stabilize the training. After training, we also utilize the dual softmax optimization method to improve the final performance. We conduct extensive experiments on both text-to-motion dataset and speech-to-motion dataset. The experimental results demonstrate that SAMR achieves the state-of-the-art performance in various cross-modal motion retrieval tasks including speech to motion and text to motion, showing great potential to serve as a general foundation motion retrieval framework.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"87 ","pages":"Article 102987"},"PeriodicalIF":3.7,"publicationDate":"2025-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143421118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
DisplaysPub Date : 2025-02-07DOI: 10.1016/j.displa.2025.102981
Xingliang Zhu , Xiaoyu Dong , Weiwei Yu , Huawei Liang , Bin Kong
{"title":"Refactored Maskformer: Refactor localization and classification for improved universal image segmentation","authors":"Xingliang Zhu , Xiaoyu Dong , Weiwei Yu , Huawei Liang , Bin Kong","doi":"10.1016/j.displa.2025.102981","DOIUrl":"10.1016/j.displa.2025.102981","url":null,"abstract":"<div><div>The introduction of DEtection TRansformers (DETR) has marked a new era for universal image segmentation in computer vision. However, methods that use shared queries and attention layers for simultaneous localization and classification often encounter inter-task optimization conflicts. In this paper, we propose a novel architecture called <strong>Refactored Maskformer</strong>, which builds upon the Mask2Former through two key modifications: Decoupler and Reconciler. The Decoupler separates decoding pathways for localization and classification, enabling task-specific query and attention layer learning. Additionally, it employs a unified masked attention to confine the regions of interest for both tasks within the same object, along with a query Interactive-Attention layer to enhance task interaction. In the Reconciler module, we mitigate the optimization conflict issue by introducing localization supervised matching cost and task alignment learning loss functions. These functions aim to encourage high localization accuracy samples, while reducing the impact of high classification confidence samples with low localization accuracy on network optimization. Extensive experimental results demonstrate that our Refactored Maskformer achieves performance comparable to existing state-of-the-art models across all unified tasks, surpassing our baseline network, Mask2former, with 1.2% PQ on COCO, 6.8% AP on ADE20k, and 1.1% mIoU on Cityscapes. The code is available at <span><span>https://github.com/leonzx7/Refactored-Maskformer</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"87 ","pages":"Article 102981"},"PeriodicalIF":3.7,"publicationDate":"2025-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143376929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"UnSP: Improving event-to-image reconstruction with uncertainty guided self-paced learning","authors":"Jianye Yang, Xiaolin Zhang, Shaofan Wang, Yanfeng Sun, Baocai Yin","doi":"10.1016/j.displa.2025.102985","DOIUrl":"10.1016/j.displa.2025.102985","url":null,"abstract":"<div><div>Asynchronous events, produced by event cameras, possess several advantages against traditional cameras: high temporal resolution, dynamic range, etc. Traditional event-to-image reconstruction methods adopt computer vision techniques and establish a correspondence between event streams and the reconstruction image. Despite great successes, those methods ignore filtering the non-confident event frames, and hence produce unsatisfactory reconstruction results. In this paper, we propose a plug-and-play model by using uncertainty guided self-paced learning (dubbed UnSP) for finetuning the event-to-image reconstruction process. The key observation of UnSP is that, different event streams, though corresponding to a common reconstruction image, serve as different functions during the training process of event-to-image reconstruction networks (e.g., shape, intensity, details are extracted in different training phases of networks). Typically, UnSP proposes an uncertainty modeling for each event frame based on its reconstruction errors induced by three metrics, and then filters confident event frames in a self-paced learning fashion. Experiments on the six subsets of the Event Camera Dataset shows that UnSP can be incorporated with any event-to-image reconstruction networks seamlessly and achieve significant improvement in both quantitative and qualitative results. In summary, the uncertainty-driven adaptive sampling and self-learning mechanisms of UnSP, coupled with its plug-and-play capability, enhance the robustness, efficiency, and versatility for event-to-image reconstruction. Code is available at <span><span>https://github.com/wangsfan/UnSP</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"87 ","pages":"Article 102985"},"PeriodicalIF":3.7,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143350303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
DisplaysPub Date : 2025-02-05DOI: 10.1016/j.displa.2025.102989
Hao Zhang , Zenghui Liu , Zhihang Yan , Songrui Guo , ChunMing Gao , Xiyao Liu
{"title":"Chinese sign language recognition and translation with virtual digital human dataset","authors":"Hao Zhang , Zenghui Liu , Zhihang Yan , Songrui Guo , ChunMing Gao , Xiyao Liu","doi":"10.1016/j.displa.2025.102989","DOIUrl":"10.1016/j.displa.2025.102989","url":null,"abstract":"<div><div>Sign language recognition and translation are crucial for communication among individuals who are deaf or mute. Deep learning methods have advanced sign language tasks, surpassing traditional techniques in accuracy through autonomous data learning. However, the scarcity of annotated sign language datasets limits the potential of these methods in practical applications. To address this, we propose using digital twin technology to build a virtual human system at the word level, which can automatically generate sign language sentences, eliminating human input, and creating numerous sign language data pairs for efficient virtual-to-real transfer. To enhance the generalization of virtual sign language data and mitigate the bias between virtual and real data, we designed novel embedding representations and augmentation methods based on skeletal information. We also established a multi-task learning framework and a pose attention module for sign language recognition and translation. Our experiments confirm the efficacy of our approach, yielding state-of-the-art results in recognition and translation.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"87 ","pages":"Article 102989"},"PeriodicalIF":3.7,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143372668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
DisplaysPub Date : 2025-02-04DOI: 10.1016/j.displa.2025.102975
Lik-Hang Lee , Kit-Yung Lam , Pan Hui
{"title":"Exploring user engagement by diagnosing visual guides in onboarding screens with linear regression and XGBoost","authors":"Lik-Hang Lee , Kit-Yung Lam , Pan Hui","doi":"10.1016/j.displa.2025.102975","DOIUrl":"10.1016/j.displa.2025.102975","url":null,"abstract":"<div><div>Onboarding screens are regarded as the first service point when a user experiences a new application, which presents the key functions and features of such an application. The User Interface (UI) walkthroughs, product tours, and tooltips are three common categories of visual guides (VGs) in the onboarding screens for users to get familiar with the app. It is important to offer first-time users appropriate VG to explain the key functions in the app interface. In this paper, we study the effective VG elements that help users adapt to the app UI. We first crowd-sourced user engagement (UE) assessments, and collected 7,080 responses reflecting user cognitive preferences to 114 collected apps containing 1,194 visual guides. Our analytics of the responses shows the improvement of VG following the analysis in three perspectives (types of UI elements, semantic, and spatial analysis). Accordingly, the proposed Parallel Boosted Regression Trees resulted in a highly accurate rating (85%) of the VGs into a three-level UE score, providing app designers useful hints on designing VGs for high levels of user retention and user engagement.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"87 ","pages":"Article 102975"},"PeriodicalIF":3.7,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143376537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
DisplaysPub Date : 2025-02-03DOI: 10.1016/j.displa.2025.102988
Yanqi Wang , Xinyue Sun , Jun Jia , Zuolin Jin , Yanning Ma
{"title":"High-precision 3D teeth reconstruction based on five-view intra-oral photos","authors":"Yanqi Wang , Xinyue Sun , Jun Jia , Zuolin Jin , Yanning Ma","doi":"10.1016/j.displa.2025.102988","DOIUrl":"10.1016/j.displa.2025.102988","url":null,"abstract":"<div><div>Reconstructing 3D dental model from multi-view intra-oral photos plays an important role in the process of orthodontic treatment. Compared with cone-beam computed tomography (CBCT) or intra-oral scanner (IOS), 3D reconstruction provides a low-cost solution to monitor teeth, which does not require professional devices and operations. This paper introduces an enhanced fully automated framework for 3D tooth reconstruction using five-view intraoral photos, capable of automatically generating the shapes, alignments, and occlusal relationships of both upper and lower teeth. The proposed framework includes three phases. Initially, a parametric dental model based on a statistical shape is built to represent the shape and position of each tooth. Next, in the feature extraction stage, the segment anything model (SAM) is used to accurately detect the tooth boundaries from intra-oral photos, and the single-view depth estimation approach known as Depth Anything is used to obtain depth information. And grayscale conversion and normalization processing are performed on the photos to extract luminance information separately in order to deal with the problem of tooth surface reflection. Finally, an iterative reconstruction process in two stages is implemented: the first stage involves alternating between searching for point correspondences and optimizing a composite loss function to align the parameterized tooth model with the predicted contours of teeth; in the second stage, image depth and lightness information are utilized for additional refinement. Extensive experiments are conducted to validate the proposed methods. Compared with existing methods, the proposed method not only qualitatively outperforms in misaligned, missing, or complex occlusion cases, but also quantificationally achieve good RMSD and Dice.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"87 ","pages":"Article 102988"},"PeriodicalIF":3.7,"publicationDate":"2025-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143162412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}