PakSign: Advancing dynamic Pakistani Sign Language Recognition with a novel skeleton-based dataset and graph-enhanced architectures

IF 3.5 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computer Vision and Image Understanding Pub Date : 2025-08-06 DOI:10.1016/j.cviu.2025.104458

Neelma Naz , Maheen Salman , Fiza Ayub , Zawata Afnan Asif , Sara Ali

{"title":"PakSign: Advancing dynamic Pakistani Sign Language Recognition with a novel skeleton-based dataset and graph-enhanced architectures","authors":"Neelma Naz , Maheen Salman , Fiza Ayub , Zawata Afnan Asif , Sara Ali","doi":"10.1016/j.cviu.2025.104458","DOIUrl":null,"url":null,"abstract":"<div><div>Sign Language Recognition (SLR) is a critical yet complex task in pattern recognition and computer vision due to the visual-gestural nature of sign languages. While regional variants like American, British, and Chinese Sign Languages have seen significant research advancements, Pakistani Sign Language (PSL) remains underexplored, mostly limited to static Urdu alphabet recognition rather than dynamic gestures used in daily communication. The scarcity of large-scale PSL datasets further hinders the training of deep learning models, which require extensive data. This work addresses these gaps by introducing a novel skeleton-based PSL dataset comprising over 1280 pose sequences of 52 Urdu signs, each performed five times by five different signers. We detail the data collection protocol and evaluate lightweight, pose-based baseline models using a K-fold cross-validation protocol. Furthermore, we propose Efficient-Sign, a novel recognition pipeline with two variants: B0, achieving a 2.28% accuracy gain with 35.37% fewer FLOPs and 63.55% fewer parameters, and B4, yielding a 3.48% accuracy improvement and 14.95% fewer parameters when compared to state-of-the-art model. We also conduct cross-dataset evaluations on widely-used benchmarks such as WLASL-100 and MINDS-Libras, where Efficient-Sign maintains competitive accuracy with substantially fewer parameters and computational overhead. These results confirm the model’s generalizability and robustness across diverse sign languages and signer populations. This work contributes significantly by providing a publicly available pose-based PSL dataset, strong baseline evaluations, and an efficient architecture for benchmarking future research, marking a critical advancement in dynamic PSL recognition and establishing a foundation for scalable, real-world SLR systems.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"260 ","pages":"Article 104458"},"PeriodicalIF":3.5000,"publicationDate":"2025-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Vision and Image Understanding","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S107731422500181X","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Sign Language Recognition (SLR) is a critical yet complex task in pattern recognition and computer vision due to the visual-gestural nature of sign languages. While regional variants like American, British, and Chinese Sign Languages have seen significant research advancements, Pakistani Sign Language (PSL) remains underexplored, mostly limited to static Urdu alphabet recognition rather than dynamic gestures used in daily communication. The scarcity of large-scale PSL datasets further hinders the training of deep learning models, which require extensive data. This work addresses these gaps by introducing a novel skeleton-based PSL dataset comprising over 1280 pose sequences of 52 Urdu signs, each performed five times by five different signers. We detail the data collection protocol and evaluate lightweight, pose-based baseline models using a K-fold cross-validation protocol. Furthermore, we propose Efficient-Sign, a novel recognition pipeline with two variants: B0, achieving a 2.28% accuracy gain with 35.37% fewer FLOPs and 63.55% fewer parameters, and B4, yielding a 3.48% accuracy improvement and 14.95% fewer parameters when compared to state-of-the-art model. We also conduct cross-dataset evaluations on widely-used benchmarks such as WLASL-100 and MINDS-Libras, where Efficient-Sign maintains competitive accuracy with substantially fewer parameters and computational overhead. These results confirm the model’s generalizability and robustness across diverse sign languages and signer populations. This work contributes significantly by providing a publicly available pose-based PSL dataset, strong baseline evaluations, and an efficient architecture for benchmarking future research, marking a critical advancement in dynamic PSL recognition and establishing a foundation for scalable, real-world SLR systems.

查看原文本刊更多论文

PakSign：通过一种新的基于骨架的数据集和图形增强的架构推进动态巴基斯坦手语识别

由于手语的视觉-手势特性，手语识别在模式识别和计算机视觉中是一项关键而复杂的任务。虽然像美国、英国和中国手语这样的区域性变体已经取得了重大的研究进展，但巴基斯坦手语（PSL）仍未得到充分探索，主要局限于静态乌尔都语字母识别，而不是日常交流中使用的动态手势。大规模PSL数据集的缺乏进一步阻碍了需要大量数据的深度学习模型的训练。这项工作通过引入一个新的基于骨架的PSL数据集来解决这些空白，该数据集包含52个乌尔都语手势的1280多个姿势序列，每个手势由五个不同的手势执行五次。我们详细介绍了数据收集协议，并使用K-fold交叉验证协议评估轻量级、基于姿势的基线模型。此外，我们提出了一种新的识别管道Efficient-Sign，它有两个变体：B0，与最先进的模型相比，它的flop减少了35.37%，参数减少了63.55%，准确度提高了2.28%;B4，与最先进的模型相比，准确度提高了3.48%，参数减少了14.95%。我们还对广泛使用的基准（如WLASL-100和MINDS-Libras）进行了跨数据集评估，其中Efficient-Sign以更少的参数和计算开销保持了具有竞争力的准确性。这些结果证实了该模型在不同手语和手语人群中的泛化性和鲁棒性。这项工作通过提供一个公开可用的基于姿态的PSL数据集、强大的基线评估和一个有效的架构来对未来的研究进行基准测试，这标志着动态PSL识别的关键进步，并为可扩展的、真实的单反系统奠定了基础。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computer Vision and Image Understanding 工程技术-工程：电子与电气

CiteScore

7.80

自引率

4.40%

发文量

112

审稿时长

79 days

期刊介绍： The central focus of this journal is the computer analysis of pictorial information. Computer Vision and Image Understanding publishes papers covering all aspects of image analysis from the low-level, iconic processes of early vision to the high-level, symbolic processes of recognition and interpretation. A wide range of topics in the image understanding area is covered, including papers offering insights that differ from predominant views. Research Areas Include: • Theory • Early vision • Data structures and representations • Shape • Range • Motion • Matching and recognition • Architecture and languages • Vision systems