Qidan Zhu, Jing Li, Fei Yuan, Jiaojiao Fan, Quan Gan
{"title":"A Chinese Continuous Sign Language Dataset Based on Complex Environments","authors":"Qidan Zhu, Jing Li, Fei Yuan, Jiaojiao Fan, Quan Gan","doi":"arxiv-2409.11960","DOIUrl":null,"url":null,"abstract":"The current bottleneck in continuous sign language recognition (CSLR)\nresearch lies in the fact that most publicly available datasets are limited to\nlaboratory environments or television program recordings, resulting in a single\nbackground environment with uniform lighting, which significantly deviates from\nthe diversity and complexity found in real-life scenarios. To address this\nchallenge, we have constructed a new, large-scale dataset for Chinese\ncontinuous sign language (CSL) based on complex environments, termed the\ncomplex environment - chinese sign language dataset (CE-CSL). This dataset\nencompasses 5,988 continuous CSL video clips collected from daily life scenes,\nfeaturing more than 70 different complex backgrounds to ensure\nrepresentativeness and generalization capability. To tackle the impact of\ncomplex backgrounds on CSLR performance, we propose a time-frequency network\n(TFNet) model for continuous sign language recognition. This model extracts\nframe-level features and then utilizes both temporal and spectral information\nto separately derive sequence features before fusion, aiming to achieve\nefficient and accurate CSLR. Experimental results demonstrate that our approach\nachieves significant performance improvements on the CE-CSL, validating its\neffectiveness under complex background conditions. Additionally, our proposed\nmethod has also yielded highly competitive results when applied to three\npublicly available CSL datasets.","PeriodicalId":501130,"journal":{"name":"arXiv - CS - Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computer Vision and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11960","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The current bottleneck in continuous sign language recognition (CSLR)
research lies in the fact that most publicly available datasets are limited to
laboratory environments or television program recordings, resulting in a single
background environment with uniform lighting, which significantly deviates from
the diversity and complexity found in real-life scenarios. To address this
challenge, we have constructed a new, large-scale dataset for Chinese
continuous sign language (CSL) based on complex environments, termed the
complex environment - chinese sign language dataset (CE-CSL). This dataset
encompasses 5,988 continuous CSL video clips collected from daily life scenes,
featuring more than 70 different complex backgrounds to ensure
representativeness and generalization capability. To tackle the impact of
complex backgrounds on CSLR performance, we propose a time-frequency network
(TFNet) model for continuous sign language recognition. This model extracts
frame-level features and then utilizes both temporal and spectral information
to separately derive sequence features before fusion, aiming to achieve
efficient and accurate CSLR. Experimental results demonstrate that our approach
achieves significant performance improvements on the CE-CSL, validating its
effectiveness under complex background conditions. Additionally, our proposed
method has also yielded highly competitive results when applied to three
publicly available CSL datasets.