DisplaysPub Date : 2025-09-30DOI: 10.1016/j.displa.2025.103233
Shurong Chai , Rahul Kumar Jain , Shiyu Teng , Jiaqing Liu , Tomoko Tateyama , Yen-Wei Chen
{"title":"A module selection-based approach for efficient skeleton human action recognition","authors":"Shurong Chai , Rahul Kumar Jain , Shiyu Teng , Jiaqing Liu , Tomoko Tateyama , Yen-Wei Chen","doi":"10.1016/j.displa.2025.103233","DOIUrl":"10.1016/j.displa.2025.103233","url":null,"abstract":"<div><div>Human action recognition has become a key aspect of human–computer interaction nowadays. Existing spatial–temporal networks-based human action recognition methods have achieved better performance but at the high cost of computational complexity. These methods make the final predictions using a stack of blocks, where each block contains a spatial and a temporal module for extracting the respective features. Whereas an alternative arrangement of these blocks in the network may affect the optimal configuration for each specific sample. Moreover, these methods need a high inference time, consequently their implementation on cutting-edge low-spec devices is challenging. To resolve these limitations, we propose a decision network-based adaptive framework that dynamically determines the arrangement of the spatial and temporal modules to ensure a cost-effective network design. To determine the optimal network structure, we have investigated module selection decision-making schemes at local and global level. We have conducted extensive experiments using three publicly available datasets. The results show our proposed framework arranges the modules in an optimal way and efficiently reduces the computation cost while maintaining the performance. Our code is available at <span><span>https://github.com/11yxk/dynamic_skeleton</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"91 ","pages":"Article 103233"},"PeriodicalIF":3.4,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145219945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Gastric Anatomical Sites Recognition in Gastroscopic Images Based on Dual-branch Perception and Multi-scale Semantic Aggregation","authors":"Shujun Gao , Xiaomei Yu , Xiao Liang , Xuanchi Chen , Xiangwei Zheng","doi":"10.1016/j.displa.2025.103234","DOIUrl":"10.1016/j.displa.2025.103234","url":null,"abstract":"<div><div>Accurate recognition of key anatomical sites in gastroscopic images is crucial for systematic screening and region-specific diagnosis of early gastric cancer. However, subtle inter-regional differences and indistinct structural boundaries present in gastroscopic images significantly decrease the clinical performance of existing recognition approaches. To address above challenges, we propose a Gastric Anatomical Sites Recognition in Gastroscopic Images Based on Dual-branch Perception and Multi-scale Semantic Aggregation (GASR) for the identification of five representative gastric regions (the greater and lesser curvatures of the antrum, the incisura angularis, and the greater and lesser curvatures of the corpus). Specifically, we propose a Dual-branch Structural Perception (DBSP) module for leveraging the effective complementarity between the local feature extraction of convolutional neural networks (CNNs) and the global semantic modeling of the Swin Transformer. To further improve contextual feature modeling, we develop a Multi-scale Contextual Sampling Aggregator (MCSA) inspired by the Atrous Spatial Pyramid Pooling (ASPP) to extract features across multiple receptive fields. Additionally, we design a Multi-granular Pooling Aggregator (MGPA) based on the Pyramid Scene Parsing (PSP) mechanism to capture hierarchical spatial semantics and global structural layouts through multi-scale pooling operations. Experimental results on a private, expert-annotated endoscopic image dataset, using five-fold cross-validation demonstrate that GASR achieves a recognition accuracy of 97.15%, with robust boundary discrimination and strong generalization performance and can be accepted in clinical practice, showing potential for clinical deployment in gastroscopy-assisted diagnosis and automated screening of early gastric cancer.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"91 ","pages":"Article 103234"},"PeriodicalIF":3.4,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145219941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
DisplaysPub Date : 2025-09-29DOI: 10.1016/j.displa.2025.103246
Zhixian Tang , Zhentao Yang , Xucheng Cai , Zhuocheng Li , Ling Wei , Pengfei Fan , Xufeng Yao
{"title":"CellKAN: Cellular multi-attention Kolmogorov-Arnold networks for nuclei segmentation in histopathology images","authors":"Zhixian Tang , Zhentao Yang , Xucheng Cai , Zhuocheng Li , Ling Wei , Pengfei Fan , Xufeng Yao","doi":"10.1016/j.displa.2025.103246","DOIUrl":"10.1016/j.displa.2025.103246","url":null,"abstract":"<div><div>This paper presents CellKAN, a novel medical image segmentation network for nuclei detection in histopathological images. The model integrates a Multi-Scale Conv Block (MSCB), Hybrid Multi-Dimensional Attention (HMDA) mechanism, and Kolmogorov-Arnold Network Block (KAN-Block) to address challenges like missed tiny lesions, heterogeneous morphology parsing, and low-contrast boundary inaccuracies. MSCB enhances multi-scale feature extraction via hierarchical refinement, while HMDA captures cross-channel-spatial dependencies through 3D convolution and dual-path pooling. KAN-Block replaces linear weights with learnable nonlinear functions, enhancing model interpretability and reducing the number of parameters. Evaluated on MoNuSeg, PanNuke, and an In-house gastrointestinal dataset, CellKAN achieves Dice coefficients of 82.91 %, 83.50 %, and 71.38 %, outperforming state-of-the-art models (e.g., U-KAN, nnUNet) by 1.29–4.49 %. Ablation studies verify that MSCB and HMDA contribute 0.35 % and 0.48 % Dice improvements on PanNuke, respectively. The model also reduces parameters compared to nnUNet while maintaining high accuracy, balancing precision and efficiency. Visual results demonstrate its superiority in noise suppression, boundary delineation, and structural integrity, highlighting its potential for clinical pathological analysis.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"91 ","pages":"Article 103246"},"PeriodicalIF":3.4,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145219946","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
DisplaysPub Date : 2025-09-26DOI: 10.1016/j.displa.2025.103196
Senhao Du , Yu Huang , Qiwen Yuan , Yongliang Dai , Zhendong Shi , Menghan Hu
{"title":"Rule-augmented LLM framework for detecting unreasonableness in ICU","authors":"Senhao Du , Yu Huang , Qiwen Yuan , Yongliang Dai , Zhendong Shi , Menghan Hu","doi":"10.1016/j.displa.2025.103196","DOIUrl":"10.1016/j.displa.2025.103196","url":null,"abstract":"<div><div>This paper proposes a rule-augmented model system for detecting unreasonable activities in Intensive Care Unit (ICU) hospitalization, mainly leveraging a large language model (LLM). The system is built on DeepSeek-R1-32B and integrates existing unreasonable activities in ICU hospitalization into health insurance systems through prompt learning techniques. Compared to traditional fixed-threshold rules, the large model augmented with rules possesses the ability to identify errors and exhibits a certain degree of emergent capabilities. In addition, it provides detailed and interpretable explanations for detected unreasonableness, helping the health insurance fund supervision perform efficient and accurate reviews. The framework includes two main sub-models: a discriminator for rule judgment, and an evaluator accuracy enhancement. Training data were derived from anonymized records from multiple hospitals and pre-processed to form the first domestic dataset tailored to unreasonable ICU billing detection tasks. The experimental results validate the effectiveness and practical value of the proposed system.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"91 ","pages":"Article 103196"},"PeriodicalIF":3.4,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145219947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
DisplaysPub Date : 2025-09-25DOI: 10.1016/j.displa.2025.103232
Kairui Zhang , Xiao Ke , Xin Chen
{"title":"Dual-stage attention based symmetric framework for stereo video quality assessment","authors":"Kairui Zhang , Xiao Ke , Xin Chen","doi":"10.1016/j.displa.2025.103232","DOIUrl":"10.1016/j.displa.2025.103232","url":null,"abstract":"<div><div>The compelling creative capabilities of stereo video have captured the attention of scholars towards its quality. Given the substantial challenge posed by asymmetric distortion in stereoscopic visual perception within the realm of stereoscopic video quality evaluation (SVQA), this study introduces the novel <span><math><mrow><msup><mrow><mi>D</mi></mrow><mrow><mn>3</mn></mrow></msup><mi>N</mi><mi>e</mi><mi>t</mi></mrow></math></span> (Dual Branch, dual-stage Attention, Dual Task) framework for stereoscopic video quality assessment. Leveraging its innovative dual-task architecture, <span><math><mrow><msup><mrow><mi>D</mi></mrow><mrow><mn>3</mn></mrow></msup><mi>N</mi><mi>e</mi><mi>t</mi></mrow></math></span> employs a dual-branch independent prediction mechanism for the left and right views. This approach not only effectively addresses the prevalent issue of asymmetric distortion in stereoscopic videos but also pinpoints which view drags the overall score down. To surmount the limitations of existing models in capturing global detail attention, <span><math><mrow><msup><mrow><mi>D</mi></mrow><mrow><mn>3</mn></mrow></msup><mi>N</mi><mi>e</mi><mi>t</mi></mrow></math></span> incorporates a two-stage distorted attention fusion module. This module enables multi-level fusion of video features at both block and pixel levels, bolstering the model’s attention towards global details and its processing capabilities, consequently enhancing the overall performance of the model. <span><math><mrow><msup><mrow><mi>D</mi></mrow><mrow><mn>3</mn></mrow></msup><mi>N</mi><mi>e</mi><mi>t</mi></mrow></math></span> has exhibited exceptional performance across mainstream and cross-domain datasets, establishing itself as the current state-of-the-art (SOTA) technology.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"91 ","pages":"Article 103232"},"PeriodicalIF":3.4,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145157453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
DisplaysPub Date : 2025-09-23DOI: 10.1016/j.displa.2025.103230
Lirong Zhang , Lei Zhou , Zhong Zheng , Zhaohua Zhou , Miao Xu , Lei Wang , Weijing Wu , Junbiao Peng
{"title":"Metal oxide TFTs gate driver and analog PWM pixel circuit employing progressive slope-compensated ramp signal for micro-LED displays","authors":"Lirong Zhang , Lei Zhou , Zhong Zheng , Zhaohua Zhou , Miao Xu , Lei Wang , Weijing Wu , Junbiao Peng","doi":"10.1016/j.displa.2025.103230","DOIUrl":"10.1016/j.displa.2025.103230","url":null,"abstract":"<div><div>A new metal oxide thin film transistors (MO TFT) gate driver has been presented for micro light-emitting diode (Micro-LED) displays with line-by-line driving method, where progressive and adjustable slope-compensated ramp signals are employed into each row of pixel array. A compensated analog pulse width modulation (PWM) pixel circuit is presented to construct the Micro-LED driving framework. This proposed gate driver with one input module and three output modules provides all the control signals for pixel array without any external integrated circuits (ICs), which simplifying the driving system. The experimented results show that the gate driver outputs integrated signals, including SCAN, EM and PWM. And the pixel circuit with single Micro-LED chip could achieve different grayscale levels from (100 to 3000 cd/m<sup>2</sup>), successfully. The slope and current of Micro-LED (<em>I<sub>LED</sub></em>) can be adjusted by applying an external bias, where the slope ranges from −0.35 to −0.57 within a bias range of −6 to −7 V, while <em>I<sub>LED</sub></em> varies from 17.3 to 61.7 μA under a bias range of 3 to 9 V. Then, the error rate of slope and brightness can achieve within 2 % and 5 % with <em>V<sub>th</sub></em> shift of about ±0.7 V after undergoing 1.5 h positive and negative bias stress test of TFT, respectively. Moreover, the proposed gate driver and pixel circuit have been verified to operate normally at high speeds with SCAN output width of 8.68 us, 6.51 us and 4.32 us, which is suitable for high-resolution Micro-LED displays.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"91 ","pages":"Article 103230"},"PeriodicalIF":3.4,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145157451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
DisplaysPub Date : 2025-09-23DOI: 10.1016/j.displa.2025.103224
Jianbo Zhang , Liang Yuan , Teng Ran , Jun Jia , Shuo Yang , Long Tang
{"title":"Less is more: An effective method to extract object features for visual dynamic SLAM","authors":"Jianbo Zhang , Liang Yuan , Teng Ran , Jun Jia , Shuo Yang , Long Tang","doi":"10.1016/j.displa.2025.103224","DOIUrl":"10.1016/j.displa.2025.103224","url":null,"abstract":"<div><div>Visual Simultaneous Localization and Mapping (VSLAM) is an essential foundation in augmented reality (AR) and mobile robotics. Dynamic scenes in the real world are a main challenge for VSLAM because it contravenes the fundamental assumptions based on static environments. Joint pose optimization with dynamic object modeling and camera pose estimation is a novel approach. However, it is challenging to model the motion of both the camera and the dynamic object when they are moving simultaneously. In this paper, we propose an efficient feature extraction approach for modeling dynamic object motion. We describe the object comprehensively through a more optimal feature selection strategy, which improves the performance of object tracking and pose estimation. The proposed approach combines image gradients and feature point clustering on dynamic objects. In the back-end optimization stage, we introduce rigid constraints on the dynamic object to optimize the poses using the graph model and obtain a high accuracy. The experimental results on the KITTI datasets demonstrate that the performance of the proposed approach is efficient and accurate.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"91 ","pages":"Article 103224"},"PeriodicalIF":3.4,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145157452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dual-output compact gate driver circuit design with embedded combinational logic for oxide TFT-based AMOLED displays","authors":"Pu Liang , Yuxuan Zhu , Haohang Zeng , Congwei Liao , Shengdong Zhang","doi":"10.1016/j.displa.2025.103231","DOIUrl":"10.1016/j.displa.2025.103231","url":null,"abstract":"<div><div>This paper presents a gate driver on array (GOA) circuit capable of generating both scan and emission (EM) signals using only a single clock-set for oxide thin-film transistor (TFT)-based active-matrix organic light-emitting diode (AMOLED) displays. By embedding a combinational logic module, the generation of EM signal does not require any additional clock-sets or start signals. This significantly reduces the complexity of external driving circuits and decreases the power consumption. Furthermore, a dual-negative power supply is employed to address the stability issues caused by negative threshold voltage. The proposed gate driver has been fabricated and verified through measurements. For a medium-sized AMOLED display with a resolution of 2560 × 1440 (QHD) and resistance–capacitance (R-C) load of 3 kΩ and 120 pF, the power consumption is only 42.72mW for 1440 gate driver circuits of 120 Hz refresh rate.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"91 ","pages":"Article 103231"},"PeriodicalIF":3.4,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145219943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
DisplaysPub Date : 2025-09-23DOI: 10.1016/j.displa.2025.103223
Gaolin Yang , Ping Shi , Jiye Zhang , Jian Xiao , Hao Zhang
{"title":"LCDiff: Line art colorization with coarse-to-fine diffusion and mask-guided voting","authors":"Gaolin Yang , Ping Shi , Jiye Zhang , Jian Xiao , Hao Zhang","doi":"10.1016/j.displa.2025.103223","DOIUrl":"10.1016/j.displa.2025.103223","url":null,"abstract":"<div><div>Line art colorization is crucial in animation production. It aims to add colors to target line art based on reference color images. The process of colorization animation remains challenging due to inadequate handling of large movements between frames, error accumulation during sequential frame processing, and color fragmentation issues during pixel-level processing. To address this issue, we propose a novel LCDiff method for line art colorization. In our method, LCDiff first utilizes a coarse-to-fine framework combining preliminary color estimation and label map diffusion modules to address the inadequate handling of large movements. Then, we introduce a color correction pathway in diffusion model that prevents error accumulation in sequential processing. Additionally, we incorporate a mask-guided voting mechanism to resolve color fragmentation issues during pixel-level processing. Extensive experiments on synthetic and real-world datasets demonstrate that our method achieves impressive performance in line art colorization.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"91 ","pages":"Article 103223"},"PeriodicalIF":3.4,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145219940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Graph-based joint detection and tracking with Euclidean edges for multi-object video analysis","authors":"Nozha Jlidi , Sameh Kouni , Olfa Jemai , Tahani Bouchrika","doi":"10.1016/j.displa.2025.103229","DOIUrl":"10.1016/j.displa.2025.103229","url":null,"abstract":"<div><div>Human detection and tracking are crucial tasks in computer vision, involving the identification and monitoring of individuals within specific areas, with applications in robotics, surveillance, and autonomous vehicles. These tasks face challenges due to variable environments, overlapping subjects, and computational limitations. To address these, we propose a novel approach using Graph Neural Networks (GNN) for joint detection and tracking (JDT) of humans in videos. Our method converts video into a graph, where nodes represent detected individuals, and edges represent connections between nodes across different frames. Node associations are established by measuring Euclidean distances between neighboring nodes, and the closest nodes are selected to form edges. This process is iteratively applied across all pairs of frames, resulting in a comprehensive graph structure for tracking. Our GNN-based JDT model was evaluated on the MOT16, MOT17, and MOT20 datasets, achieving MOTA of 85.2, ML of 11, IDF1 of 46, and MT of 65.7 on the MOT16 dataset, MOTA of 86.7 and IDF1 of 72.7 on the MOT17 dataset, and MOTA of 73.5 and IDF1 of 71.2 on the MOT20 dataset. The results demonstrate that our model outperforms existing state-of-the-art methods in both accuracy and efficiency. Through this innovative graph-based method, we contribute a robust and scalable solution to the field of human detection and tracking.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"91 ","pages":"Article 103229"},"PeriodicalIF":3.4,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145118302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}