Displays最新文献

筛选
英文 中文
A module selection-based approach for efficient skeleton human action recognition 基于模块选择的高效骨骼人体动作识别方法
IF 3.4 2区 工程技术
Displays Pub Date : 2025-09-30 DOI: 10.1016/j.displa.2025.103233
Shurong Chai , Rahul Kumar Jain , Shiyu Teng , Jiaqing Liu , Tomoko Tateyama , Yen-Wei Chen
{"title":"A module selection-based approach for efficient skeleton human action recognition","authors":"Shurong Chai ,&nbsp;Rahul Kumar Jain ,&nbsp;Shiyu Teng ,&nbsp;Jiaqing Liu ,&nbsp;Tomoko Tateyama ,&nbsp;Yen-Wei Chen","doi":"10.1016/j.displa.2025.103233","DOIUrl":"10.1016/j.displa.2025.103233","url":null,"abstract":"<div><div>Human action recognition has become a key aspect of human–computer interaction nowadays. Existing spatial–temporal networks-based human action recognition methods have achieved better performance but at the high cost of computational complexity. These methods make the final predictions using a stack of blocks, where each block contains a spatial and a temporal module for extracting the respective features. Whereas an alternative arrangement of these blocks in the network may affect the optimal configuration for each specific sample. Moreover, these methods need a high inference time, consequently their implementation on cutting-edge low-spec devices is challenging. To resolve these limitations, we propose a decision network-based adaptive framework that dynamically determines the arrangement of the spatial and temporal modules to ensure a cost-effective network design. To determine the optimal network structure, we have investigated module selection decision-making schemes at local and global level. We have conducted extensive experiments using three publicly available datasets. The results show our proposed framework arranges the modules in an optimal way and efficiently reduces the computation cost while maintaining the performance. Our code is available at <span><span>https://github.com/11yxk/dynamic_skeleton</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"91 ","pages":"Article 103233"},"PeriodicalIF":3.4,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145219945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Gastric Anatomical Sites Recognition in Gastroscopic Images Based on Dual-branch Perception and Multi-scale Semantic Aggregation 基于双分支感知和多尺度语义聚合的胃镜图像胃解剖部位识别
IF 3.4 2区 工程技术
Displays Pub Date : 2025-09-30 DOI: 10.1016/j.displa.2025.103234
Shujun Gao , Xiaomei Yu , Xiao Liang , Xuanchi Chen , Xiangwei Zheng
{"title":"Gastric Anatomical Sites Recognition in Gastroscopic Images Based on Dual-branch Perception and Multi-scale Semantic Aggregation","authors":"Shujun Gao ,&nbsp;Xiaomei Yu ,&nbsp;Xiao Liang ,&nbsp;Xuanchi Chen ,&nbsp;Xiangwei Zheng","doi":"10.1016/j.displa.2025.103234","DOIUrl":"10.1016/j.displa.2025.103234","url":null,"abstract":"<div><div>Accurate recognition of key anatomical sites in gastroscopic images is crucial for systematic screening and region-specific diagnosis of early gastric cancer. However, subtle inter-regional differences and indistinct structural boundaries present in gastroscopic images significantly decrease the clinical performance of existing recognition approaches. To address above challenges, we propose a Gastric Anatomical Sites Recognition in Gastroscopic Images Based on Dual-branch Perception and Multi-scale Semantic Aggregation (GASR) for the identification of five representative gastric regions (the greater and lesser curvatures of the antrum, the incisura angularis, and the greater and lesser curvatures of the corpus). Specifically, we propose a Dual-branch Structural Perception (DBSP) module for leveraging the effective complementarity between the local feature extraction of convolutional neural networks (CNNs) and the global semantic modeling of the Swin Transformer. To further improve contextual feature modeling, we develop a Multi-scale Contextual Sampling Aggregator (MCSA) inspired by the Atrous Spatial Pyramid Pooling (ASPP) to extract features across multiple receptive fields. Additionally, we design a Multi-granular Pooling Aggregator (MGPA) based on the Pyramid Scene Parsing (PSP) mechanism to capture hierarchical spatial semantics and global structural layouts through multi-scale pooling operations. Experimental results on a private, expert-annotated endoscopic image dataset, using five-fold cross-validation demonstrate that GASR achieves a recognition accuracy of 97.15%, with robust boundary discrimination and strong generalization performance and can be accepted in clinical practice, showing potential for clinical deployment in gastroscopy-assisted diagnosis and automated screening of early gastric cancer.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"91 ","pages":"Article 103234"},"PeriodicalIF":3.4,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145219941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CellKAN: Cellular multi-attention Kolmogorov-Arnold networks for nuclei segmentation in histopathology images CellKAN:细胞多注意力Kolmogorov-Arnold网络在组织病理图像中的细胞核分割
IF 3.4 2区 工程技术
Displays Pub Date : 2025-09-29 DOI: 10.1016/j.displa.2025.103246
Zhixian Tang , Zhentao Yang , Xucheng Cai , Zhuocheng Li , Ling Wei , Pengfei Fan , Xufeng Yao
{"title":"CellKAN: Cellular multi-attention Kolmogorov-Arnold networks for nuclei segmentation in histopathology images","authors":"Zhixian Tang ,&nbsp;Zhentao Yang ,&nbsp;Xucheng Cai ,&nbsp;Zhuocheng Li ,&nbsp;Ling Wei ,&nbsp;Pengfei Fan ,&nbsp;Xufeng Yao","doi":"10.1016/j.displa.2025.103246","DOIUrl":"10.1016/j.displa.2025.103246","url":null,"abstract":"<div><div>This paper presents CellKAN, a novel medical image segmentation network for nuclei detection in histopathological images. The model integrates a Multi-Scale Conv Block (MSCB), Hybrid Multi-Dimensional Attention (HMDA) mechanism, and Kolmogorov-Arnold Network Block (KAN-Block) to address challenges like missed tiny lesions, heterogeneous morphology parsing, and low-contrast boundary inaccuracies. MSCB enhances multi-scale feature extraction via hierarchical refinement, while HMDA captures cross-channel-spatial dependencies through 3D convolution and dual-path pooling. KAN-Block replaces linear weights with learnable nonlinear functions, enhancing model interpretability and reducing the number of parameters. Evaluated on MoNuSeg, PanNuke, and an In-house gastrointestinal dataset, CellKAN achieves Dice coefficients of 82.91 %, 83.50 %, and 71.38 %, outperforming state-of-the-art models (e.g., U-KAN, nnUNet) by 1.29–4.49 %. Ablation studies verify that MSCB and HMDA contribute 0.35 % and 0.48 % Dice improvements on PanNuke, respectively. The model also reduces parameters compared to nnUNet while maintaining high accuracy, balancing precision and efficiency. Visual results demonstrate its superiority in noise suppression, boundary delineation, and structural integrity, highlighting its potential for clinical pathological analysis.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"91 ","pages":"Article 103246"},"PeriodicalIF":3.4,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145219946","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Rule-augmented LLM framework for detecting unreasonableness in ICU 用于ICU不合理诊断的规则增强LLM框架
IF 3.4 2区 工程技术
Displays Pub Date : 2025-09-26 DOI: 10.1016/j.displa.2025.103196
Senhao Du , Yu Huang , Qiwen Yuan , Yongliang Dai , Zhendong Shi , Menghan Hu
{"title":"Rule-augmented LLM framework for detecting unreasonableness in ICU","authors":"Senhao Du ,&nbsp;Yu Huang ,&nbsp;Qiwen Yuan ,&nbsp;Yongliang Dai ,&nbsp;Zhendong Shi ,&nbsp;Menghan Hu","doi":"10.1016/j.displa.2025.103196","DOIUrl":"10.1016/j.displa.2025.103196","url":null,"abstract":"<div><div>This paper proposes a rule-augmented model system for detecting unreasonable activities in Intensive Care Unit (ICU) hospitalization, mainly leveraging a large language model (LLM). The system is built on DeepSeek-R1-32B and integrates existing unreasonable activities in ICU hospitalization into health insurance systems through prompt learning techniques. Compared to traditional fixed-threshold rules, the large model augmented with rules possesses the ability to identify errors and exhibits a certain degree of emergent capabilities. In addition, it provides detailed and interpretable explanations for detected unreasonableness, helping the health insurance fund supervision perform efficient and accurate reviews. The framework includes two main sub-models: a discriminator for rule judgment, and an evaluator accuracy enhancement. Training data were derived from anonymized records from multiple hospitals and pre-processed to form the first domestic dataset tailored to unreasonable ICU billing detection tasks. The experimental results validate the effectiveness and practical value of the proposed system.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"91 ","pages":"Article 103196"},"PeriodicalIF":3.4,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145219947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dual-stage attention based symmetric framework for stereo video quality assessment 基于双阶段注意力的立体视频质量评价对称框架
IF 3.4 2区 工程技术
Displays Pub Date : 2025-09-25 DOI: 10.1016/j.displa.2025.103232
Kairui Zhang , Xiao Ke , Xin Chen
{"title":"Dual-stage attention based symmetric framework for stereo video quality assessment","authors":"Kairui Zhang ,&nbsp;Xiao Ke ,&nbsp;Xin Chen","doi":"10.1016/j.displa.2025.103232","DOIUrl":"10.1016/j.displa.2025.103232","url":null,"abstract":"<div><div>The compelling creative capabilities of stereo video have captured the attention of scholars towards its quality. Given the substantial challenge posed by asymmetric distortion in stereoscopic visual perception within the realm of stereoscopic video quality evaluation (SVQA), this study introduces the novel <span><math><mrow><msup><mrow><mi>D</mi></mrow><mrow><mn>3</mn></mrow></msup><mi>N</mi><mi>e</mi><mi>t</mi></mrow></math></span> (Dual Branch, dual-stage Attention, Dual Task) framework for stereoscopic video quality assessment. Leveraging its innovative dual-task architecture, <span><math><mrow><msup><mrow><mi>D</mi></mrow><mrow><mn>3</mn></mrow></msup><mi>N</mi><mi>e</mi><mi>t</mi></mrow></math></span> employs a dual-branch independent prediction mechanism for the left and right views. This approach not only effectively addresses the prevalent issue of asymmetric distortion in stereoscopic videos but also pinpoints which view drags the overall score down. To surmount the limitations of existing models in capturing global detail attention, <span><math><mrow><msup><mrow><mi>D</mi></mrow><mrow><mn>3</mn></mrow></msup><mi>N</mi><mi>e</mi><mi>t</mi></mrow></math></span> incorporates a two-stage distorted attention fusion module. This module enables multi-level fusion of video features at both block and pixel levels, bolstering the model’s attention towards global details and its processing capabilities, consequently enhancing the overall performance of the model. <span><math><mrow><msup><mrow><mi>D</mi></mrow><mrow><mn>3</mn></mrow></msup><mi>N</mi><mi>e</mi><mi>t</mi></mrow></math></span> has exhibited exceptional performance across mainstream and cross-domain datasets, establishing itself as the current state-of-the-art (SOTA) technology.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"91 ","pages":"Article 103232"},"PeriodicalIF":3.4,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145157453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Metal oxide TFTs gate driver and analog PWM pixel circuit employing progressive slope-compensated ramp signal for micro-LED displays 用于微型led显示器的金属氧化物TFTs栅极驱动器和采用递进斜率补偿斜坡信号的模拟PWM像素电路
IF 3.4 2区 工程技术
Displays Pub Date : 2025-09-23 DOI: 10.1016/j.displa.2025.103230
Lirong Zhang , Lei Zhou , Zhong Zheng , Zhaohua Zhou , Miao Xu , Lei Wang , Weijing Wu , Junbiao Peng
{"title":"Metal oxide TFTs gate driver and analog PWM pixel circuit employing progressive slope-compensated ramp signal for micro-LED displays","authors":"Lirong Zhang ,&nbsp;Lei Zhou ,&nbsp;Zhong Zheng ,&nbsp;Zhaohua Zhou ,&nbsp;Miao Xu ,&nbsp;Lei Wang ,&nbsp;Weijing Wu ,&nbsp;Junbiao Peng","doi":"10.1016/j.displa.2025.103230","DOIUrl":"10.1016/j.displa.2025.103230","url":null,"abstract":"<div><div>A new metal oxide thin film transistors (MO TFT) gate driver has been presented for micro light-emitting diode (Micro-LED) displays with line-by-line driving method, where progressive and adjustable slope-compensated ramp signals are employed into each row of pixel array. A compensated analog pulse width modulation (PWM) pixel circuit is presented to construct the Micro-LED driving framework. This proposed gate driver with one input module and three output modules provides all the control signals for pixel array without any external integrated circuits (ICs), which simplifying the driving system. The experimented results show that the gate driver outputs integrated signals, including SCAN, EM and PWM. And the pixel circuit with single Micro-LED chip could achieve different grayscale levels from (100 to 3000 cd/m<sup>2</sup>), successfully. The slope and current of Micro-LED (<em>I<sub>LED</sub></em>) can be adjusted by applying an external bias, where the slope ranges from −0.35 to −0.57 within a bias range of −6 to −7 V, while <em>I<sub>LED</sub></em> varies from 17.3 to 61.7 μA under a bias range of 3 to 9 V. Then, the error rate of slope and brightness can achieve within 2 % and 5 % with <em>V<sub>th</sub></em> shift of about ±0.7 V after undergoing 1.5 h positive and negative bias stress test of TFT, respectively. Moreover, the proposed gate driver and pixel circuit have been verified to operate normally at high speeds with SCAN output width of 8.68 us, 6.51 us and 4.32 us, which is suitable for high-resolution Micro-LED displays.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"91 ","pages":"Article 103230"},"PeriodicalIF":3.4,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145157451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Less is more: An effective method to extract object features for visual dynamic SLAM 少即是多:一种有效的视觉动态SLAM目标特征提取方法
IF 3.4 2区 工程技术
Displays Pub Date : 2025-09-23 DOI: 10.1016/j.displa.2025.103224
Jianbo Zhang , Liang Yuan , Teng Ran , Jun Jia , Shuo Yang , Long Tang
{"title":"Less is more: An effective method to extract object features for visual dynamic SLAM","authors":"Jianbo Zhang ,&nbsp;Liang Yuan ,&nbsp;Teng Ran ,&nbsp;Jun Jia ,&nbsp;Shuo Yang ,&nbsp;Long Tang","doi":"10.1016/j.displa.2025.103224","DOIUrl":"10.1016/j.displa.2025.103224","url":null,"abstract":"<div><div>Visual Simultaneous Localization and Mapping (VSLAM) is an essential foundation in augmented reality (AR) and mobile robotics. Dynamic scenes in the real world are a main challenge for VSLAM because it contravenes the fundamental assumptions based on static environments. Joint pose optimization with dynamic object modeling and camera pose estimation is a novel approach. However, it is challenging to model the motion of both the camera and the dynamic object when they are moving simultaneously. In this paper, we propose an efficient feature extraction approach for modeling dynamic object motion. We describe the object comprehensively through a more optimal feature selection strategy, which improves the performance of object tracking and pose estimation. The proposed approach combines image gradients and feature point clustering on dynamic objects. In the back-end optimization stage, we introduce rigid constraints on the dynamic object to optimize the poses using the graph model and obtain a high accuracy. The experimental results on the KITTI datasets demonstrate that the performance of the proposed approach is efficient and accurate.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"91 ","pages":"Article 103224"},"PeriodicalIF":3.4,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145157452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dual-output compact gate driver circuit design with embedded combinational logic for oxide TFT-based AMOLED displays 基于氧化物tft的AMOLED显示器双输出紧凑型门驱动电路的嵌入式组合逻辑设计
IF 3.4 2区 工程技术
Displays Pub Date : 2025-09-23 DOI: 10.1016/j.displa.2025.103231
Pu Liang , Yuxuan Zhu , Haohang Zeng , Congwei Liao , Shengdong Zhang
{"title":"Dual-output compact gate driver circuit design with embedded combinational logic for oxide TFT-based AMOLED displays","authors":"Pu Liang ,&nbsp;Yuxuan Zhu ,&nbsp;Haohang Zeng ,&nbsp;Congwei Liao ,&nbsp;Shengdong Zhang","doi":"10.1016/j.displa.2025.103231","DOIUrl":"10.1016/j.displa.2025.103231","url":null,"abstract":"<div><div>This paper presents a gate driver on array (GOA) circuit capable of generating both scan and emission (EM) signals using only a single clock-set for oxide thin-film transistor (TFT)-based active-matrix organic light-emitting diode (AMOLED) displays. By embedding a combinational logic module, the generation of EM signal does not require any additional clock-sets or start signals. This significantly reduces the complexity of external driving circuits and decreases the power consumption. Furthermore, a dual-negative power supply is employed to address the stability issues caused by negative threshold voltage. The proposed gate driver has been fabricated and verified through measurements. For a medium-sized AMOLED display with a resolution of 2560 × 1440 (QHD) and resistance–capacitance (R-C) load of 3 kΩ and 120 pF, the power consumption is only 42.72mW for 1440 gate driver circuits of 120 Hz refresh rate.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"91 ","pages":"Article 103231"},"PeriodicalIF":3.4,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145219943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LCDiff: Line art colorization with coarse-to-fine diffusion and mask-guided voting LCDiff:线条艺术着色与粗到细的扩散和蒙版引导投票
IF 3.4 2区 工程技术
Displays Pub Date : 2025-09-23 DOI: 10.1016/j.displa.2025.103223
Gaolin Yang , Ping Shi , Jiye Zhang , Jian Xiao , Hao Zhang
{"title":"LCDiff: Line art colorization with coarse-to-fine diffusion and mask-guided voting","authors":"Gaolin Yang ,&nbsp;Ping Shi ,&nbsp;Jiye Zhang ,&nbsp;Jian Xiao ,&nbsp;Hao Zhang","doi":"10.1016/j.displa.2025.103223","DOIUrl":"10.1016/j.displa.2025.103223","url":null,"abstract":"<div><div>Line art colorization is crucial in animation production. It aims to add colors to target line art based on reference color images. The process of colorization animation remains challenging due to inadequate handling of large movements between frames, error accumulation during sequential frame processing, and color fragmentation issues during pixel-level processing. To address this issue, we propose a novel LCDiff method for line art colorization. In our method, LCDiff first utilizes a coarse-to-fine framework combining preliminary color estimation and label map diffusion modules to address the inadequate handling of large movements. Then, we introduce a color correction pathway in diffusion model that prevents error accumulation in sequential processing. Additionally, we incorporate a mask-guided voting mechanism to resolve color fragmentation issues during pixel-level processing. Extensive experiments on synthetic and real-world datasets demonstrate that our method achieves impressive performance in line art colorization.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"91 ","pages":"Article 103223"},"PeriodicalIF":3.4,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145219940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Graph-based joint detection and tracking with Euclidean edges for multi-object video analysis 基于欧几里得边缘的图联合检测与跟踪多目标视频分析
IF 3.4 2区 工程技术
Displays Pub Date : 2025-09-22 DOI: 10.1016/j.displa.2025.103229
Nozha Jlidi , Sameh Kouni , Olfa Jemai , Tahani Bouchrika
{"title":"Graph-based joint detection and tracking with Euclidean edges for multi-object video analysis","authors":"Nozha Jlidi ,&nbsp;Sameh Kouni ,&nbsp;Olfa Jemai ,&nbsp;Tahani Bouchrika","doi":"10.1016/j.displa.2025.103229","DOIUrl":"10.1016/j.displa.2025.103229","url":null,"abstract":"<div><div>Human detection and tracking are crucial tasks in computer vision, involving the identification and monitoring of individuals within specific areas, with applications in robotics, surveillance, and autonomous vehicles. These tasks face challenges due to variable environments, overlapping subjects, and computational limitations. To address these, we propose a novel approach using Graph Neural Networks (GNN) for joint detection and tracking (JDT) of humans in videos. Our method converts video into a graph, where nodes represent detected individuals, and edges represent connections between nodes across different frames. Node associations are established by measuring Euclidean distances between neighboring nodes, and the closest nodes are selected to form edges. This process is iteratively applied across all pairs of frames, resulting in a comprehensive graph structure for tracking. Our GNN-based JDT model was evaluated on the MOT16, MOT17, and MOT20 datasets, achieving MOTA of 85.2, ML of 11, IDF1 of 46, and MT of 65.7 on the MOT16 dataset, MOTA of 86.7 and IDF1 of 72.7 on the MOT17 dataset, and MOTA of 73.5 and IDF1 of 71.2 on the MOT20 dataset. The results demonstrate that our model outperforms existing state-of-the-art methods in both accuracy and efficiency. Through this innovative graph-based method, we contribute a robust and scalable solution to the field of human detection and tracking.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"91 ","pages":"Article 103229"},"PeriodicalIF":3.4,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145118302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信