Quan Liu, Mincheng Cai, Dujuan Liu, Simeng Ma, Qianhong Zhang, Dan Xiang, Lihua Yao, Zhongchun Liu, Jun Yang
{"title":"ESS MS-G3D: extension and supplement shift MS-G3D network for the assessment of severe mental retardation","authors":"Quan Liu, Mincheng Cai, Dujuan Liu, Simeng Ma, Qianhong Zhang, Dan Xiang, Lihua Yao, Zhongchun Liu, Jun Yang","doi":"10.1007/s40747-023-01275-1","DOIUrl":null,"url":null,"abstract":"<p>Automated mental retardation (MR) assessment is potential for improving the diagnostic efficiency and objectivity in clinical practice. Based on the researches on abnormal behavior characteristics of patients with MR, we propose an extension and supplement shift multi-scale G3D (ESS MS-G3D) network for video-based assessment of MR. Specifically, all videos are collected from clinical diagnostic scenarios and the skeleton sequence of human body is extracted from videos through an advanced pose estimation model. To solve the shortcomings of existing behavior characteristic learning methods, we present: (1) three G3D styles, enable the network to have different input forms; (2) two G3D graphs and two extension graphs, redefine and extend the graph structure of spatial–temporal nodes; (3) two learnable parameters, realize adaptive adjustment of graph structure; (4) a shift layer, enable the network to learn global features. Finally, we construct a three-branch model ESS MS-STGC, which can capture the discriminative spatial–temporal features and explore the co-occurrence relationship between spatial and temporal domains. Experiments in clinical video data set show that our proposed model has good performance in MR assessment and is superior to the existing vision-based methods. In two-classification task, our model with joint stream achieves the highest accuracy of <span>\\(94.63\\%\\)</span> in validation set and <span>\\(89.13\\%\\)</span> in test set. The results are further improved to <span>\\(96.52\\%\\)</span> and <span>\\(93.22\\%\\)</span>, respectively, by utilizing multi-stream fusion strategy. In four-classification task, our model obtains Top1 accuracy of <span>\\(78.84\\%\\)</span> and Top2 accuracy of <span>\\(91.34\\%\\)</span> in test set. The proposed method provides a new idea for clinical mental retardation assessment.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"29 1","pages":""},"PeriodicalIF":5.0000,"publicationDate":"2023-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Complex & Intelligent Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s40747-023-01275-1","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Automated mental retardation (MR) assessment is potential for improving the diagnostic efficiency and objectivity in clinical practice. Based on the researches on abnormal behavior characteristics of patients with MR, we propose an extension and supplement shift multi-scale G3D (ESS MS-G3D) network for video-based assessment of MR. Specifically, all videos are collected from clinical diagnostic scenarios and the skeleton sequence of human body is extracted from videos through an advanced pose estimation model. To solve the shortcomings of existing behavior characteristic learning methods, we present: (1) three G3D styles, enable the network to have different input forms; (2) two G3D graphs and two extension graphs, redefine and extend the graph structure of spatial–temporal nodes; (3) two learnable parameters, realize adaptive adjustment of graph structure; (4) a shift layer, enable the network to learn global features. Finally, we construct a three-branch model ESS MS-STGC, which can capture the discriminative spatial–temporal features and explore the co-occurrence relationship between spatial and temporal domains. Experiments in clinical video data set show that our proposed model has good performance in MR assessment and is superior to the existing vision-based methods. In two-classification task, our model with joint stream achieves the highest accuracy of \(94.63\%\) in validation set and \(89.13\%\) in test set. The results are further improved to \(96.52\%\) and \(93.22\%\), respectively, by utilizing multi-stream fusion strategy. In four-classification task, our model obtains Top1 accuracy of \(78.84\%\) and Top2 accuracy of \(91.34\%\) in test set. The proposed method provides a new idea for clinical mental retardation assessment.
期刊介绍:
Complex & Intelligent Systems aims to provide a forum for presenting and discussing novel approaches, tools and techniques meant for attaining a cross-fertilization between the broad fields of complex systems, computational simulation, and intelligent analytics and visualization. The transdisciplinary research that the journal focuses on will expand the boundaries of our understanding by investigating the principles and processes that underlie many of the most profound problems facing society today.